Removes or tags duplicates after indexing.
This index action runs on a specified subset of the content, locating duplicates using a variety of methods. Any duplicates can then be deleted, moved to a different database, or tagged within a specified field, depending on the value of DuplicateAction that is chosen.
Note: The DREDUPLICATE
index action only removes duplicate documents within a single Content component, rather than removing duplicates over the whole distributed system. To remove all duplicates, you must ensure that duplicates of a document are all sent to the same instance of the Content component, for example by using DistributeByFields mode.
http://12.3.4.56:20001/DREDUPLICATE?DuplicateAction=Delete&ReferenceField=*/DREREFERENCE
In this example, duplicates are identified using the DREREFERENCE
field, and any duplicates found are deleted.
Parameter | Description | Required |
---|---|---|
ChecksumField | A reference field used to determine whether a match is exact. | |
Database | The database to move duplicates to. | see Comments |
DatabaseMatch | A list of databases to search for duplicates in. | |
DuplicateAction | The action to perform on duplicates. | Yes |
IgnoreMaxPendingItems | Whether to ignore the IndexQueueMaxPendingItems limit for this index action. | |
IndexUID | An identification code for any document tracking events. | |
MaxID | The last DocID to find duplicates of. | |
MinID | The first DocID to find duplicates of. | |
Priority | The priority for the index job. | |
ReferenceField | A reference field to use as the initial determination of whether two documents are a match. | Yes |
TagField | The field to tag duplicates with. | see Comments |
TagValue | The static value to tag duplicates with in the TagField. | |
ThreadHashField | The field containing the thread hash values used to determine whether a match is a duplicate. |
You must set Database when DuplicateAction is set to Database
.
You must set TagField when DuplicateAction is set to Tag
.
|