Start Processing
After you have configured the ingestion pipeline, you can start processing.
To start processing
-
In the NiFi canvas, right-click your GetFileSystem processor and click Start.
This connector begins fetching data. The KeyViewExtractFiles processor has not been started, so the documents wait in the preceding queue:
-
Start the KeyViewExtractFiles processor (right click the processor and click Start).
The processor begins processing documents from the queue. If there is a problem, a bulletin icon (
) appears on the processor. Hover the cursor over this icon to view the error message. In the following example, the processor is unable to locate the KeyView libraries because the KeyView controller service is using the
KEYVIEW_DIRECTORY
environment variable but that variable has not been set:If you need to fix a problem like this, remember to stop the processor and disable the KeyView controller service before attempting to modify their configurations. After you have made your changes, enable the KeyView controller service and start the processor. When the processor extracts the documents successfully, they move into the next queue ready for processing by the next processor. You might notice that the KeyViewExtractFiles processor outputs more FlowFiles than it received. This is because it has extracted subfiles from their containers and each subfile is represented by a new FlowFile.
-
Start the KeyViewFilterDocument and StandardizeMetadata processors.
The FlowFiles are processed and move to the next queue.
-
The next processor is the RemoveDocumentPart processor. Before starting that processor, you can look at the FlowFiles in its queue:
-
Right-click the queue and click List Queue.
A dialog box opens that lists the FlowFiles in the queue.
-
For one of the FlowFiles, click the view details (
) icon.
The FlowFile dialog box opens and displays information about the flow file.
-
Click View.
The FlowFile is displayed.
-
In the View as box, click formatted.
The FlowFile should have a section named ContentFile or ContentFilename. ContentFile is present when the FlowFile has associated binary data and ContentFilename is present when the FlowFile contains the path to an associated file. If the FlowFile represents a subfile that was extracted by the KeyViewExtractFiles processor, it has a ContentFilename section that references the extracted file in the KeyView temporary directory.
If you look at FlowFiles after processing by the RemoveDocumentPart processor, you should notice that the ContentFile and ContentFilename parts have been removed.
- Close the FlowFile window and the FlowFile dialog box.
-
-
Start the remaining processors.
NiFi processes the FlowFiles. You can then query IDOL and find the documents that have been indexed.