Introduction to FlowFiles and Documents
The basic unit of data in Apache NiFi is the FlowFile. When you look at your data flow in the NiFi web interface, you can see FlowFiles being queued and counted by your processors. So that they integrate with the Apache NiFi framework, NiFi Ingest components also handle FlowFiles. For example, a connector creates FlowFiles which are then processed by a KeyView processor or Media Analysis processor.
A FlowFile that represents a Knowledge Discovery document has the following attributes.
Attribute | Description |
---|---|
idol.doc.identifier
|
This attribute is present if the FlowFile was created by a NiFi Ingest Connector. It contains the document identifier (the same as the AUTN_IDENTIFIER metadata field). |
idol.doc.source
|
This attribute is present if the FlowFile was created by a NiFi Ingest Connector. It contains a colon-separated list of the following values:
|
idol.doc.part.part_id.file.name
|
A display name for the file that is embedded in the corresponding FlowFile part. This attribute can be present when binary file content is embedded inside the FlowFile part. |
idol.doc.part.part_id.file.own
|
A Boolean value to indicate whether NiFi Ingest owns the file referenced by the corresponding FlowFile part, and can delete it when processing is complete. To ensure that temporary files owned by NiFi Ingest are deleted, you can use a RemoveDocumentPart processor (as described in Remove Temporary Files). |
idol.reference
|
The document reference (the same as the DREREFERENCE document field). |
idol.reference.action
|
This attribute is usually assigned by a connector and indicates the indexing operation to perform, for example Add , Update , or Remove . |
idol.type
|
The type of content contained in the body of the FlowFile. The FlowFiles created by NiFi Ingest connectors have a type of NiFi Ingest processors also accept FlowFiles where this attribute has the following values, so that they can process FlowFiles created by other NiFi processors:
|
idol.xmlmetadata
|
Contains the document metadata in XML format. |
mime.type
|
The MIME type for the body of the FlowFile:
|
application/x.idol.doc
The application/x.idol.doc
format is a binary format that can include multiple parts. A single document can contain any number of contentfilename
, contentfile
, externalfile
, and content
parts.
Part type | Description |
---|---|
contentfilename
|
A document part that contains a path to a file stored on a local or network file system. |
contentfile
|
A document part that contains the binary content of a file. |
externalfile
|
A document part that contains a reference to a file stored by an external storage provider such as Amazon S3, Azure Blob Storage, or Google Cloud Storage. |
content
|
A document part that contains one or more pages of text content. |