Locate Duplicate Documents
You can locate duplicate documents in the data index after indexing has taken place by using the DREDUPLICATE index action. This action locates duplicates in a specified subset of the content, and then removes them, tags a field, or moves the duplicate documents to another database.
http://ContentHost:indexPort/DREDUPLICATE?ReferenceField=Field&DuplicateAction=Action
where:
ContentHost
|
is the IP address or host name of the machine on which the IDOL Content component is installed. |
indexPort
|
is the IDOL Content component index port (specified as IndexPort in the [Server] section of the IDOL Content component configuration file). |
Field
|
is a ReferenceType field used as the initial determination of whether two documents are a match. |
Action
|
is the action to perform on a duplicate. The following options are available:
|
For example:
http://MyHost:20001/DREDUPLICATE?ReferenceField=DOCUMENT/DREREFERENCE&DuplicateAction=Database&Database=Duplicates
This action uses port 20001
to remove duplicates from the IDOL Content component that is located on the machine with the host name MyHost
. Content uses the DREREFERENCE
field to identify duplicate documents, and moves them to the Duplicates
database.
http://MyHost:20001/DREDUPLICATE?ReferenceField=DOCUMENT/DREREFERENCE&DuplicateAction=Tag&TagField=DOCUMENT/DRETITLE&TagValue=Duplicate
In this example, Content initially uses the DREREFERENCE
field to identify the duplicate documents, and then it changes the DRETITLE
field to the value Duplicate
.
To prevent Content from indexing duplicate documents, use the KillDuplicates parameter with the DREADD and DREADDDATA index actions.
For details on the other parameters that are available for the DREDUPLICATE index action, refer to the IDOL Content component Reference.