Ingestion

Ingestion is the process where information in a repository is converted into documents that can be indexed into the Content component. Ingestion starts when a connector finds new items in a repository, or items that have been updated or deleted. To keep Content up to date with the changes, the connector creates documents representing the new, updated, and deleted items.

Ingestion includes all of the processing that takes place before documents are indexed. For example, your NiFi Ingest dataflow might use File Content Extraction to filter the text from a file, run field standardization, and perhaps media analysis or Eduction. These operations enrich the documents, before they are indexed into Content.