ClusterSGDataGen

Allows you to generate spectrograph data from snapshots that you have taken by using the ClusterSnapshot action. The ClusterSGDataGen action uses the snapshots in the time span defined by one or more of the StartDate, EndDate, and Interval parameters.

NOTE:

This is an administrative action that can be sent only from AdminClients (which are set in the [Server] section of the configuration file).

ClusterSGDataGen generates a data set based on multiple cluster snapshots withe the same job name (which you specify with SourceJobName). You can use the data to generate a spectrograph, which is a visual representation of how clusters change over a given time period.

You must have created at least two snapshots with the specified SourceJobName for the ClusterSGDataGen action to run successfully. You can generate additional snapshots by using the FillGaps parameter.

When you send the ClusterSGDataGen action, HPE Category Component queues it. After it finishes, the spectrograph data sets it has generated are stored in the cluster/SGDATA directory in your HPE Category Component installation directory. You can retrieve the spectrograph image, data, or documents by using the ClusterSGPicServe, ClusterSGDataServe, and ClusterSGDocsServe actions.

Example

http://12.3.4.56:9000/action=ClusterSGDataGen&SourceJobName=Job1&TargetJobName=Job1a&StartDate=1000290039&EndDate=1000290650

This action uses port 9000 to generate spectrograph data from all snapshots called Job1 that were generated between September 12 2001 at 11:20:39 and September 12 2001 at 11:30:50 (if no snapshot is available for these dates, HPE Category Component uses the snapshots that were generated before these dates). The snapshots are stored on a machine with the IP address 12.3.4.56. The spectrograph data set that is generated is called Job1a.

Parameters

Parameter Description Required
BindLevel The conceptual similarity of clusters.  
ComparisonTolerance The maximum amount of time between snapshots to compare.  
Cycles The number of times to run the action.  
Dependencies A list of classification schedules that must be complete before the ClusterSGDataGen action can run.  
EndDate The time span for which to generate spectrograph data. See Comments
FillGaps Whether to create additional snapshots for the spectrograph generation if none exist in the timespan.  
FillGapsFrequency The interval at which HPE Category Component checks if snapshots exist in the timespan for which it is generating a spectrograph.  
Interval The time span for which to generate spectrograph data. See Comments
ProfileSourceJobName A profile snapshot to compare to the SourceJobName.  
RankSections The relevance that documents must have to a cluster.  
Repeat The time to elapse between runs of the action.  
Retries The number of times to retry a failed action.  
RetryInterval The number of seconds to wait before retrying a failed action.  
SourceJobName The snapshot to generate spectrograph data from. Yes
SourceJobName2 The name of the snapshot to compare to the SourceJobName snapshot.  
StartDate The time span for which to generate spectrograph data. See Comments
StartTime The time to run the first action.  
TargetJobName The name of the spectrograph data to generate. Yes
Username The name of the user performing the action.  
XMLEncoding Overrides the default XML encoding.  

This action accepts the following standard ACI action parameters.

Parameter Description
ActionID A string to use to identify an ACI action.
EncryptResponse Encrypt the output.
FileName The file to write output to.
ForceTemplateRefresh Forces the server to load the template from disk.
Output Writes output to a file.
ResponseFormat The format of the action output.
Template The template to use for the action output.
TemplateParamCSVs A list of variables to use for the specified template.

Comments

You must define a timespan, which you can set with one or more of the following parameters:

Result Format

The results from the spectrograph data returns each cluster in a <node> element. This contains the following attributes:

nodeID The ID for the node.
title The cluster title and a summary of its contents.
clusterID The ID of the cluster (this ID is the same across multiple snapshots).
numDocs The number of documents in the cluster, including any duplicates (duplicates can occur because a document can be used in multiple cluster seeds, which might then be combined during the clustering process).
absDocs The number of unique documents in the cluster.
selfSimilarity A measure of how tight the cluster is, between zero and one. A value close to one indicates a well-defined single subject. A value closer to zero indicates a broader grouping.
whatsHotScore A measure of how important the cluster is. This score measures how similar the documents in a cluster are, and how narrow the range of concepts in that cluster are. When a cluster contains a lot of very similar data, it has a higher What’s Hot score, so higher scores represent more important trends.
intensity A measure of the cluster size, represented on the spectrograph by how brightly the cluster is colored.
radius A measure of the cluster importance, represented on the spectrograph by the width of the cluster. This value is based on the WhatsHotScore for the cluster.
yPos The y-position (in pixels) of this cluster on the spectrograph. (The x-position is defined by the times that each of the snapshots were taken.)
colour The top color for the cluster. This value is returned as an integer, where a higher number means a darker color. You can use these values to assign colors for each cluster.
connection (nodeID and weight). An indication of a link to other nodes in adjacent snapshots, and the strength of this link.

_HP_HTML5_bannerTitle.htm