Prevent Duplicate Documents
You can configure the IDOL Content component to implement deduplication when indexing documents. This process prevents storage of the same document or document content. If Content determines that the document to index matches an existing document, it replaces the existing document with the new document.
The IDOL Content component uses deduplication options to determine whether documents match. See Deduplication Options—KillDuplicates.
You can enable deduplication in one of three ways:
-
Enable deduplication for all indexing jobs by using the
KillDuplicates
configuration parameter in the[Server]
section of the IDOL Content component configuration file. See Enable Deduplication for all Index Jobs.You can use the
KillDuplicatesChecksumField
configuration parameter with deduplication to prevent the IDOL Content component from unnecessarily updating existing documents. See Use KillDuplicatesChecksumField to Prevent Unnecessary Indexing.You can also use the
KillDuplicatesPreserveFields
configuration parameter with deduplication to copy the specified IDX fields from an existing document to a newer version. -
Enable deduplication for individual indexing jobs by using the
KillDuplicates
action parameter in theDREADD
andDREADDDATA
actions. See Enable Deduplication for Individual Index Jobs.Use the
KeepExisting
action parameter with deduplication to discard the incoming document instead of replacing the existing document, This option reduces the indexing load. See Use KeepExisting to Minimize the Index Load. -
Enable deduplication when indexing with Connector Framework Server (CFS) by setting the
KillDuplicates
configuration parameter for the connector. See Enable Deduplication for Connector Index Jobs.
Some other IDOL Content component parameters affect the behavior of the deduplication settings. See Deduplication Constraints.
You can deduplicate after indexing by using the DREDUPLICATE
index action. See Locate Duplicate Documents.