Introduction to FlowFiles and Documents

The basic unit of data in Apache NiFi is the FlowFile. When you look at your data flow in the NiFi web interface, you can see FlowFiles being queued and counted by your processors. So that they integrate with the Apache NiFi framework, IDOL NiFi Ingest components also handle FlowFiles. For example, a connector creates FlowFiles which are then processed by a KeyView processor or Media Analysis processor.

A FlowFile that represents an IDOL document has the following attributes.

Attribute Description
idol.doc.identifier This attribute is present if the FlowFile was created by an IDOL NiFi Ingest Connector. It contains the document identifier (the same as the AUTN_IDENTIFIER metadata field).
idol.doc.source

This attribute is present if the FlowFile was created by an IDOL NiFi Ingest Connector. It contains a colon-separated list of the following values:

  • The display name for the type of connector that created the FlowFile, for example "File System Connector".
  • The display name of the NiFi processor, for example "MyGetFileSystem".
  • The ID of the NiFi processor that created the FlowFile.
idol.reference The document reference (the same as the DREREFERENCE document field).
idol.reference.action This attribute is usually assigned by a connector and indicates the indexing operation to perform, for example Add, Update, or Remove.
idol.type

The type of content contained in the body of the FlowFile.

The FlowFiles created by IDOL NiFi Ingest connectors have a type of document. This is a binary format that can include multiple parts, including binary content from a file. For more information about this format, see application/x.idol.doc.

IDOL NiFi Ingest processors also accept FlowFiles where this attribute has the following values, so that they can process FlowFiles created by other NiFi processors:

  • contentfile - The FlowFile body contains the binary body of a file.
  • contentfilename - A full, local, file path in plain text.
  • content - Text content to use as the DRECONTENT of an IDOL document.
  • xmlmetadata - XML metadata.
idol.xmlmetadata Contains the IDOL document metadata in XML format.
mime.type

The MIME type for the body of the FlowFile:

  • application/x.idol.doc when the idol.type is document.
  • application/octet-stream when the idol.type is contentfile.
  • text/plain when the idol.type is contentfilename or content.
  • text/xml when the idol.type is xmlmetadata.

application/x.idol.doc

The application/x.idol.doc format is a binary format that can include multiple parts. A single document can contain any number of contentfilename, contentfile, and content parts.

Part type Description
contentfilename

A document part that contains a file path.

When a document has a contentfilename part, the FlowFile can have an attribute named idol.doc.part.part_id.file.own which contains a Boolean value to indicate whether NiFi Ingest owns the file and can delete it when processing is complete.

To ensure that temporary files owned by NiFi Ingest are deleted, you can use a RemoveDocumentPart processor (as described in Remove Temporary Files).

contentfile

A document part that contains the binary content of a file.

When a document has a contentfile part, the FlowFile can have an attribute named idol.doc.part.part_id.file.name which contains a display name for the file.

content A document part that contains one or more pages of text content.