DREDUPLICATE
Removes or tags duplicates after indexing.
This index action runs on a specified subset of the content, locating duplicates using a variety of methods. Any duplicates can then be deleted, moved to a different database, or tagged within a specified field, depending on the value of DuplicateAction that is chosen.
NOTE: The DREDUPLICATE
index action only removes duplicate documents within a single Content component, rather than removing duplicates over the whole distributed system. To remove all duplicates, you must ensure that duplicates of a document are all sent to the same instance of the Content component, for example by using DistributeByFields mode.
Example
http://12.3.4.56:9071/DREDUPLICATE?DuplicateAction=Delete&ReferenceField=*/DREREFERENCE
In this example, duplicates are identified using the DREREFERENCE
field, and any duplicates found are deleted.
Required Parameters
The following action parameters are required.
Parameter | Description |
---|---|
DuplicateAction | The action to perform on duplicates. |
ReferenceField | A reference field to use as the initial determination of whether two documents are a match. |
Optional Parameters
This action accepts the following optional parameters.
Parameter | Description |
---|---|
ChecksumField | A reference field used to determine whether a match is exact. |
Database | The database to move duplicates to. You must set this parameter when DuplicateAction is set to Database . |
DatabaseMatch | A list of databases to search for duplicates in. |
MaxID | The last DocID to find duplicates of. |
MinID | The first DocID to find duplicates of. |
TagField | The field to tag duplicates with. You must set this parameter when DuplicateAction is set to Tag . |
TagValue | The static value to tag duplicates with in the TagField. |
ThreadHashField | The field containing the thread hash values used to determine whether a match is a duplicate. |
This index action accepts the following standard index action parameters.
Parameter | Description |
---|---|
IgnoreMaxPendingItems | Whether to ignore the IndexQueueMaxPendingItems limit for this index action. |
IndexUID | An identification code for any document tracking events. |
NoArchive | Turn off configured archiving for the index action. |
Priority | The priority for the index job. |
Comments
You must set TagField when DuplicateAction is set to Tag
.
-
You must set TagField when DuplicateAction is set to
Tag
.