Troubleshoot Clusters from Snapshots
Consider the following factors when you have poor results, or when clusters are not produced.
-
Clustering uses document
Suggest
actions. Ensure that the documents in the initial snapshot query contain source fields, which are used to suggest related documents. -
If your log files indicate that the snapshot did not produce enough seeds, try reducing the
SeedSize
andSeedBindLevel
configuration parameters to create more seeds. -
If IDOL does not extract any clusters, or extracts fewer clusters than expected, try reducing the
BindLevel
andEquivalence
configuration parameters, which determine how seeds are merged. In addition, you can improve the snapshot by reducing theSeedSize
andSeedBindLevel
parameters. This produces more seeds, and then hopefully more clusters. OpenText recommends that if you experience any problems with cluster extraction, you should focus on improving the snapshot. -
The
MinClusterDocs
parameter determines the minimum number of documents that a cluster must have; clusters with fewer documents are discarded. If your log files show that enough seeds have been generated, but clustering does not produce enough clusters, try reducing this parameter, either in theClusterCluster
action or the configuration file. -
Avoid setting clustering parameters to zero. This can result in unusual or unexpected behavior. To switch off certain settings (for example
Equivalence
andBindLevel
), set the parameter to-1
, to use internal defaults. -
Ensure that the initial query is not too restrictive. If the snapshot query only returns a few results, clustering might not work properly. By default, the query selects 5,000 documents at random from the index, which is usually sufficient to detect trends.
-
IDOL Server uses the
RandomSeed
configuration parameter to select documents at random from the query result set. During troubleshooting, set a value for this parameter to ensure that IDOL Server uses the same documents from the query set each time.