Use Built-In Apache NiFi Processors

The Apache NiFi framework includes built-in processors that can retrieve data (processors that are supplied with the Apache NiFi framework, rather than with IDOL NiFi Ingest). For example, there is a GetFile processor that can retrieve files from a directory. You can use the Apache NiFi processors to retrieve data, but before routing the FlowFiles to an IDOL NiFi Ingest processor, you must ensure that several attributes are set.

An IDOL NiFi Ingest processor expects the following attributes to be set on each FlowFile.

  • idol.reference - a document reference. OpenText recommends that you create unique references, so that IDOL Content can de-duplicate documents based on the document reference.
  • idol.reference.action - the indexing operation to perform with the data, for example Add, Update, or Remove.
  • idol.type - specifies the type of data contained in the FlowFile, for example a file path or the binary content of a file.

The following example shows a dataflow with a GetFile processor to retrieve files, followed by an UpdateAttribute processor to set the required attributes on each FlowFile. The output from the UpdateAttribute processor is suitable to be routed to IDOL NiFi Ingest processors, in this example a KeyViewExtractFiles processor.

The following image shows part of the configuration for the UpdateAttribute processor.

This example uses FlowFile attributes ("path" and "filename"), that have been set by the GetFile processor, to populate the IDOL document reference. The attributes that you can use to populate the reference will vary depending on which processor you use to retrieve data. In most cases, idol.reference.action can be set to Add, so that data is added to the IDOL index. If a file is retrieved again and a new document is indexed, IDOL can replace the old data because the new document will have the same reference. The GetFile processor creates FlowFiles where the body of the FlowFile contains the binary content from a file, so the idol.type attribute is set to contentfile. For more information about these attributes, see Introduction to FlowFiles and Documents.

To add attributes to FlowFiles using the UpdateAttribute processor

  1. Right-click the UpdateAttribute processor and click Configure.

    The Configure Processor dialog box opens.

  2. Click the Properties tab.
  3. Click Add .

    The Add Property dialog box opens.

  4. In the Property Name box, type the name of the attribute that you want to add to your FlowFiles, for example idol.type, and then click OK.
  5. Type a value for the attribute, for example contentfile.
  6. Repeat steps 3-5 to add other attributes as required.
  7. Click APPLY.