Introduction to FlowFiles and Documents
The basic unit of data in Apache NiFi is the FlowFile. When you look at your data flow in the NiFi web interface, you can see FlowFiles being queued and counted by your processors. So that they integrate with the Apache NiFi framework, IDOL NiFi Ingest components also handle FlowFiles. For example, a connector creates FlowFiles which are then processed by a KeyView processor or Media Analysis processor.
A FlowFile that represents an IDOL document has the following attributes.
Attribute | Description |
---|---|
idol.doc.identifier
|
This attribute is present if the FlowFile was created by an IDOL NiFi Ingest Connector. It contains the document identifier (the same as the AUTN_IDENTIFIER metadata field). |
idol.doc.source
|
This attribute is present if the FlowFile was created by an IDOL NiFi Ingest Connector. It contains a colon-separated list of the following values:
|
idol.reference
|
The document reference (the same as the DREREFERENCE document field). |
idol.reference.action
|
This attribute is usually assigned by a connector and indicates the indexing operation to perform, for example Add , Update , or Remove . |
idol.type
|
The type of content contained in the body of the FlowFile. The FlowFiles created by IDOL NiFi Ingest connectors have a type of IDOL NiFi Ingest processors also accept FlowFiles where this attribute has the following values, so that they can process FlowFiles created by other NiFi processors:
|
idol.xmlmetadata
|
Contains the IDOL document metadata in XML format. |
mime.type
|
The MIME type for the body of the FlowFile:
|
application/x.idol.doc
The application/x.idol.doc
format is a binary format that can include multiple parts. A single document can contain any number of contentfilename
, contentfile
, and content
parts.
Part type | Description |
---|---|
contentfilename
|
A document part that contains a file path. When a document has a To ensure that temporary files owned by NiFi Ingest are deleted, you can use a |
contentfile
|
A document part that contains the binary content of a file. When a document has a |
content
|
A document part that contains one or more pages of text content. |