KeyViewExtractFiles

A processor that uses KeyView to extract sub-files from container files such as zip archives.

The processor reads any files embedded in the FlowFile or referenced by the FlowFile, and passes these to KeyView. If the files are containers, KeyView extracts sub-files to the location defined by the KeyView Filter Service, and either creates a new FlowFile for each sub-file, or adds new parts referencing these files to the original FlowFile. This behavior is controlled by the Create new FlowFiles from sub-files property. If you choose to create new FlowFiles, the metadata and attributes associated with the parent FlowFile are inherited by the child.

Properties

Name Default Value Description
KeyView Filter Service   A KeyViewFilterServiceImpl that manages the location of the KeyView binaries and temporary file storage.
IDOL License Service   An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server.
Document Registry Service   A DocumentRegistryServiceImpl controller service that manages and updates a document registry database. This ensures that documents are indexed in the correct order. This processor registers new FlowFiles that represent extracted subfiles.
Create new FlowFiles from sub-files True A Boolean value that specifies whether to create a new FlowFile for each extracted sub-file. If you set this property to false, the processor associates sub-files with the original FlowFile.
Inherited Fields Regex .* A regular expression to choose metadata fields that documents representing sub-files should inherit from their parent document.
Not Inherited Fields Regex   A regular expression to choose metadata fields that documents representing sub-files should not inherit from their parent document.
Merge mails False

(Applies only to MSG and EML files).

MSG and EML files are containers. Normally when these are extracted and Create new FlowFiles from sub-files is true, the processor outputs the original FlowFile, representing the container, to the "success" relationship, and a FlowFile representing the .mail subfile to the "subfile" relationship. It is the subfile that contains the message header and content.

When you set this property to "true", the processor does not output a FlowFile to the "subfile" relationship. Instead, the subfile metadata is added to the original FlowFile. The ContentFile or ContentFilename part of the original FlowFile is also modified, so that it contains the binary content or file path of the .mail file rather than the MSG/EML container. The FlowFile is then output to the "success" relationship.

If you set this property to "true" when Create new FlowFiles from sub-files is false, the only difference is that the output FlowFile does not contain a ContentFileor ContentFilename part representing the MSG/EML container.

Relationships

Name Description
success Successfully processed FlowFiles are routed to this relationship. When Create new FlowFiles from sub-files is set to false, the FlowFiles can contain extra attributes referencing extracted sub-files.
subfile Receives new FlowFiles that represent extracted sub-files. This relationship exists only when you set the property Create new FlowFiles from sub-files to true.
failure FlowFiles that were not processed successfully. For example a FlowFile was provided but had an invalid or unknown format.

Extract Subfiles Recursively

When you set Create new FlowFiles from sub-files to true, the processor does not extract subfiles recursively. For example, if a zip file contains another zip, the second zip is not automatically extracted. If you want to extract subfiles recursively, connect the subfile relationship back to the KeyViewExtractFiles processor so that new FlowFiles representing extracted subfiles are processed in the same way as the original FlowFiles. Eventually, all of the containers will be extracted and all of the resulting FlowFiles will be routed to the success relationship. For example:

When you set Create new FlowFiles from sub-files to false, the processor automatically extracts subfiles recursively. In this case the subfile relationship does not exist.

_FT_HTML5_bannerTitle.htm