Introduction
Connectors monitor your organization's data repositories so that your IDOL index is kept up to date. When new items are added to a repository, the connector sends documents for ingestion. When items are deleted the connector ingests a document to represent the deletion so that the document in the IDOL index is removed.
If you view the attributes of a NiFi FlowFile you can see that there is an attribute named idol.reference.action. This attribute specifies what an indexer, such as the PutIDOL processor, must do with the document. For example, if the FlowFile represents a new item that was created in the repository, the connector sets this attribute to Add so that the document is added to the index.
When a connector synchronizes with a repository it might find an existing item has changed. If the change affects only the document metadata, the connector ingests a metadata-only document with the updated field values. In this case the idol.reference.action is set to Update.
If an item in a repository is modified and the change affects the document content, the connector ingests two documents. The first document has an idol.reference.action of Delete to delete the existing document from the index, and the second has the action Add, to add a document containing the new content. These documents must be indexed in the correct order. The delete has to be indexed first, to remove the existing document. If the documents are indexed the other way around then the new content is added and immediately deleted.
NiFi Ingest provides a way to ensure that your documents are indexed in the correct order, but this is something that you must configure.
To ensure that documents are indexed in the correct order:
- Create a DocumentRegistryService controller service. This service manages a database of dependencies for all of the documents that you process.
- Configure your processors to use the DocumentRegistryService. The PutIDOL processor has to use the service so that it can index documents in the correct order, but you might also need to configure other processors. Some NiFi Ingest processors create documents and these must also be tracked. For example, NiFi Ingest connectors create documents when they synchronize with a data repository. The KeyViewExtractFiles processor can create new documents that represent extracted subfiles, and these documents might be dependencies of another document. Therefore the KeyViewExtractFiles processor must register the documents it creates through your DocumentRegistryService.
- Start your dataflow with either a NiFi Ingest connector, or an input port followed by a RegisterDocument processor.
- Ensure that all paths through the dataflow end with an UnregisterDocument processor.
IMPORTANT: Do not delete FlowFiles that have been registered with a DocumentRegistryService. If you stop processing a FlowFile that has been registered by a connector or through the RegisterDocument processor, any subsequent processor that uses the DocumentRegistryServiceImpl might wait for that document indefinitely. The only way to remove dependencies on a deleted FlowFile is to stop processing and delete the document registry database.
If you want to stop processing a FlowFile (for example, because you do not want to index it), OpenText recommends that you route the FlowFile to an UnregisterDocument processor.