DREDUPLICATE

Removes or tags duplicates after indexing.

This index action runs on a specified subset of the content, locating duplicates using a variety of methods. Any duplicates can then be deleted, moved to a different database, or tagged within a specified field, depending on the value of DuplicateAction that is chosen.

NOTE: The DREDUPLICATE index action only removes duplicate documents within a single Content component, rather than removing duplicates over the whole distributed system. To remove all duplicates, you must ensure that duplicates of a document are all sent to the same instance of the Content component, for example by using DistributeByFields mode.

Example

http://12.3.4.56:9071/DREDUPLICATE?DuplicateAction=Delete&ReferenceField=*/DREREFERENCE

In this example, duplicates are identified using the DREREFERENCE field, and any duplicates found are deleted.

Required Parameters

The following action parameters are required.

Parameter Description
DuplicateAction The action to perform on duplicates.
ReferenceField A reference field to use as the initial determination of whether two documents are a match.

Optional Parameters

This action accepts the following optional parameters.

Parameter Description
ChecksumField A reference field used to determine whether a match is exact.
Database The database to move duplicates to. You must set this parameter when DuplicateAction is set to Database.
DatabaseMatch A list of databases to search for duplicates in.
MaxID The last DocID to find duplicates of.
MinID The first DocID to find duplicates of.
TagField The field to tag duplicates with. You must set this parameter when DuplicateAction is set to Tag.
TagValue The static value to tag duplicates with in the TagField.
ThreadHashField The field containing the thread hash values used to determine whether a match is a duplicate.

This index action accepts the following standard index action parameters.

Parameter Description
IgnoreMaxPendingItems Whether to ignore the IndexQueueMaxPendingItems limit for this index action.
IndexUID An identification code for any document tracking events.
NoArchive Turn off configured archiving for the index action.
Priority The priority for the index job.

Comments

  • You must set TagField when DuplicateAction is set to Tag.