Introduction

The following image shows the completed ingestion pipeline that is described in the following sections.

The pipeline includes the following steps:

  • File System Connector. The File System Connector retrieves data from a local or network file system. The connector produces a NiFi FlowFile to represent each file that is retrieved from the file system.
  • KeyView Extraction. Extracts files from containers. For example, if a FlowFile represents a zip archive, KeyView extracts the contents of the archive.
  • KeyView Filtering. Filtering extracts the text from a file and adds it to the document content. The text can then be indexed into IDOL, which means that IDOL does not need to process the data in its original format.
  • Field Standardization. Field standardization modifies documents so that they have a consistent structure and consistent field names. You can use field standardization so that documents which originated from different connectors use the same fields to store the same type of information.
  • Remove Document Part. This step removes the binary content or file reference from a FlowFile. Removing file references allows NiFi to delete temporary files.
  • Indexing. Documents are indexed into an IDOL Content component.