Remove Temporary Files
The ingestion process generates temporary files. IDOL Connectors can download files from a repository. KeyView writes extracted subfiles to a temporary storage area. It can be useful to keep these files, for example so that you can troubleshoot problems while developing the pipeline, but usually they should be deleted when processing is complete.
If you examine a NiFi FlowFile that originated from an IDOL Connector (and it is not a metadata-only document), you will see one of the following document parts:
contentfile
- contains binary file content.contentfilename
- the path to a file on disk. The file might, or might not, be owned by NiFi.
For example, if you use a File System Connector there will be a FlowFile for each file that is retrieved from the file system. Depending on how you configure the connector, each FlowFile will contain either the binary content of the associated file or a path to the file on disk.
When you delete a contentfilename
document part and the associated file is owned by NiFi, the file is automatically deleted. This section describes how to remove file references from FlowFiles before the documents are indexed into IDOL, so that temporary files are deleted.
To remove temporary files
-
Add an idol.nifi > RemoveDocumentPart processor to the canvas.
-
Create a connection between the StandardizeMetadata processor and the RemoveDocumentPart processor. Hover the mouse over the StandardizeMetadata processor until you see the connection icon -
- and then drag the icon to the RemoveDocumentPart processor.
The Create Connection dialog box opens.
-
In the For Relationships area, select the success check box and then click ADD.
The connection appears on the canvas and NiFi automatically adds the queue between the processors.
-
Right-click the RemoveDocumentPart processor and click Configure.
The Configure Processor dialog box opens.
- Click the Properties tab.
-
In the properties list, ensure that the properties have the following values:
Remove document content parts False (keep the document content so that it can be indexed into IDOL) Remove document contentfile parts True (to remove binary content from the FlowFile) Remove document contentfilename parts True (to remove file references and allow NiFi to delete temporary files) Remove document xmlmetadata parts False (keep the document metadata so that it can be indexed into IDOL) - Click Apply.