Track Documents through the Ingestion Pipeline
The Apache NiFi framework stores information about FlowFiles as they move through your ingestion pipeline. You can use the "Data Provenance" feature to see what happened to a document.
To configure searching by IDOL document reference or identifier
- Stop the NiFi instance.
- Open the configuration file
./conf/nifi.properties
. -
Set or review the following properties:
-
nifi.provenance.repository.indexed.attributes
Set this property to
idol.reference, idol.doc.identifier
so that you can search for documents by their IDOL document reference or identifier. - You can also specify the maximum amount of storage to use for storing provenance data, and the maximum amount of time to keep the data for.
-
- Save and close the configuration file.
- Start the NiFi instance.
To track a document
- Open the NiFi user interface.
-
Click
followed by Data Provenance.
The NiFi Data Provenance dialog box opens.
- Click
.
-
Type a document reference (in the idol.reference box) or identifier (in the idol.doc.identifier box) and click SEARCH.
NOTE: Type the exact reference or identifier. Alternatively, you can use the
*
wildcard to represent any number of characters. However, if you use the*
wildcard, the backslash character (\
) becomes a special character and must be escaped. For example:C:\Some\Path\document.txt
- (a wildcard is not used)C:\\Some\\Path\\*
- (a wildcard is used , so escape the backslashes in the path. This matches all references that start withC:\Some\Path\
)
If the value you want to search for includes a literal asterisk (
*
), escape the asterisk with a backslash (\
).The NiFi Data Provenance dialog box displays events related to the selected document(s).
For more information about data provenance features, refer to the Apache NiFi documentation.