ClusterSGDataGen

Allows you to generate spectrograph data from snapshots that you have taken by using the ClusterSnapshot action. The ClusterSGDataGen action uses the snapshots in the time span defined by one or more of the StartDate, EndDate, and Interval parameters.

NOTE: This is an administrative action that can be sent only by users that belong to an authorization role that allows the Admin standard role, or which enables the action explicitly. See Authorization Roles Configuration Parameters.

ClusterSGDataGen generates a data set based on multiple cluster snapshots with the same job name (which you specify with SourceJobName). You can use the data to generate a spectrograph, which is a visual representation of how clusters change over a given time period.

You must have created at least two snapshots with the specified SourceJobName for the ClusterSGDataGen action to run successfully. You can generate additional snapshots by using the FillGaps parameter.

When you send the ClusterSGDataGen action, IDOL Server queues it. After it finishes, the spectrograph data sets it has generated are stored in the cluster/SGDATA directory in your IDOL Server installation directory. You can retrieve the spectrograph image, data, or documents by using the ClusterSGPicServe, ClusterSGDataServe, and ClusterSGDocsServe actions.

Example

http://12.3.4.56:9000/action=ClusterSGDataGen&SourceJobName=Job1&TargetJobName=Job1a&StartDate=1000290039&EndDate=1000290650

This action generates spectrograph data from all snapshots called Job1 that were generated between September 12 2001 at 11:20:39 and September 12 2001 at 11:30:50 (if no snapshot is available for these dates, IDOL Server uses the snapshots that were generated before these dates). The spectrograph data set that is generated is called Job1a.

Required Parameters

The following action parameters are required.

Parameter Description
SourceJobName The snapshot to generate spectrograph data from.
TargetJobName The name of the spectrograph data to generate.

You must define a time span, which you can set by using one or more of the following parameters.

Parameter Description
EndDate The time span for which to generate spectrograph data.
Interval The time span for which to generate spectrograph data.
StartDate The time span for which to generate spectrograph data.

Optional Parameters

This action accepts the following optional parameters.

Parameter Description
BindLevel The conceptual similarity of clusters.
ComparisonTolerance The maximum amount of time between snapshots to compare.
Cycles The number of times to run the action.
Dependencies A list of classification schedules that must be complete before the ClusterSGDataGen action can run.
DREQuery A query to use to restrict snapshot generation when you set FillGaps to True.
FillGaps Whether to create additional snapshots for the spectrograph generation if none exist in the timespan.
FillGapsFrequency The interval at which IDOL Server checks if snapshots exist in the timespan for which it is generating a spectrograph.
ForceTimestamp A specific time stamp to use for the generated spectrograph data.
Params The names of parameters to use in the Suggest actions that IDOL Server uses to create seeds for addition snapshots, when you set FillGaps to True.
ProfileSourceJobName A profile snapshot to compare to the SourceJobName.
RankSections The relevance that documents must have to a cluster.
Repeat The time to elapse between runs of the action.
Retries The number of times to retry a failed action.
RetryInterval The number of seconds to wait before retrying a failed action.
SeedBindLevel A value that specifies how closely bound concepts must be to form a cluster seed for creating snapshots when you set FillGaps to True.
SeedSize The size of the document group that forms a seed for creating snapshots when you set FillGaps to True.
SourceJobName2 The name of the snapshot to compare to the SourceJobName snapshot.
StartTime The time to run the first action.
Username The name of the user performing the action.
Values The values for the specified Params.
XMLEncoding Overrides the default XML encoding.

This action accepts the following standard ACI action parameters.

Parameter Description
ActionID A string to use to identify an ACI action.
FileName The file to write output to.
ForceTemplateRefresh Forces the server to load the template from disk.
Output Writes output to a file.
ResponseFormat The format of the action output.
Template The template to use for the action output.
TemplateParamCSVs A list of variables to use for the specified template.

Comments

You must define a timespan, which you can set with one or more of the following parameters:

Result Format

The results from the spectrograph data returns each cluster in a <node> element. This contains the following attributes:

nodeID The ID for the node.
title The cluster title and a summary of its contents.
clusterID The ID of the cluster (this ID is the same across multiple snapshots).
numDocs The number of documents in the cluster, including any duplicates (duplicates can occur because a document can be used in multiple cluster seeds, which might then be combined during the clustering process).
absDocs The number of unique documents in the cluster.
selfSimilarity A measure of how tight the cluster is, between zero and one. A value close to one indicates a well-defined single subject. A value closer to zero indicates a broader grouping.
whatsHotScore A measure of how important the cluster is. This score measures how similar the documents in a cluster are, and how narrow the range of concepts in that cluster are. When a cluster contains a lot of very similar data, it has a higher What’s Hot score, so higher scores represent more important trends.
intensity A measure of the cluster size, represented on the spectrograph by how brightly the cluster is colored.
radius A measure of the cluster importance, represented on the spectrograph by the width of the cluster. This value is based on the WhatsHotScore for the cluster.
yPos The y-position (in pixels) of this cluster on the spectrograph. (The x-position is defined by the times that each of the snapshots were taken.)
colour The top color for the cluster. This value is returned as an integer, where a higher number means a darker color. You can use these values to assign colors for each cluster.
connection (nodeID and weight). An indication of a link to other nodes in adjacent snapshots, and the strength of this link.