KeyViewExtractFiles

A processor that uses KeyView to extract sub-files from container files such as zip archives.

The processor reads any files embedded in the FlowFile or referenced by the FlowFile, and passes these to KeyView. If the files are containers, KeyView extracts sub-files to the location defined by the KeyView Filter Service, and either creates a new FlowFile for each sub-file, or adds new parts referencing these files to the original FlowFile. This behavior is controlled by the Create new FlowFiles from sub-files property. If you choose to create new FlowFiles, the metadata and attributes associated with the parent FlowFile are inherited by the child.

Properties

Name Default Value Description
KeyView Filter Service   A KeyViewFilterServiceImpl that manages the location of the KeyView binaries and temporary file storage.
IDOL License Service   An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server.
Document Registry Service   A DocumentRegistryServiceImpl controller service that manages and updates a document registry database. This ensures that documents are indexed in the correct order. This processor registers new FlowFiles that represent extracted subfiles.
Create new FlowFiles from sub-files True A Boolean value that specifies whether to create a new FlowFile for each extracted sub-file. If you set this property to false, the processor associates sub-files with the original FlowFile.
Extracted File Part Type match_input Specifies the type of document part to use for extracted files. You can embed the file content in the FlowFile or add a path to the file on disk. Choose match_input to use the same type of document part that was used for including the container in the input FlowFile.
Inherited Fields Regex .* A regular expression to choose metadata fields that documents representing sub-files should inherit from their parent document.
Not Inherited Fields Regex   A regular expression to choose metadata fields that documents representing sub-files should not inherit from their parent document.
Merge mails False

(Applies only to MSG and EML files).

MSG and EML files are containers. Normally when these are extracted and Create new FlowFiles from sub-files is true, the processor outputs the original FlowFile, representing the container, to the "success" relationship, and a FlowFile representing the .mail subfile to the "subfile" relationship. It is the subfile that contains the message header and content.

When you set this property to "true", the processor does not output a FlowFile to the "subfile" relationship. Instead, the subfile metadata is added to the original FlowFile. The ContentFile or ContentFilename part of the original FlowFile is also modified, so that it contains the binary content or file path of the .mail file rather than the MSG/EML container. The FlowFile is then output to the "success" relationship.

If you set this property to "true" when Create new FlowFiles from sub-files is false, the only difference is that the output FlowFile does not contain a ContentFileor ContentFilename part representing the MSG/EML container.

Relationships

Name Description
success Successfully processed FlowFiles are routed to this relationship. When Create new FlowFiles from sub-files is set to false, the FlowFiles have an extra document part for each extracted sub-file.
subfile Receives new FlowFiles that represent extracted sub-files. This relationship exists only when you set the property Create new FlowFiles from sub-files to true. See Extract Subfiles Recursively.
failure FlowFiles that were not processed successfully. For example a FlowFile was provided but had an invalid or unknown format.

Extract Subfiles Recursively

When you set Create new FlowFiles from sub-files to true, the processor does not extract subfiles recursively. For example, if a zip file contains another zip, the second zip is not automatically extracted. If you want to extract subfiles recursively, route the extracted subfiles back to the KeyViewExtractFiles processor so that new FlowFiles are processed in the same way as the original FlowFile. Eventually, all of the containers will be extracted and all of the resulting FlowFiles will be routed to the success relationship.

OpenText recommends using the following configuration:

  • Route the "subfile" relationship to a funnel placed before the processor.
  • Configure the input queue so that the processor extracts subfiles before extracting new containers.

    1. Stop the KeyViewExtractFiles processor, if it is running.

    2. Right-click the queue immediately before the processor and click Configure.

      The Configure Connection dialog box opens.

    3. Click the Settings tab.

    4. Drag the PriorityAttributePrioritizer from the Available Prioritizers area to the Selected Prioritizers area. The KeyViewExtractFiles processor adds a priority attribute to any FlowFile that is output to the "subfile" relationship. The value of the attribute is "0" for the first level of subfiles and is decremented for each level of subfiles that is extracted. For example, a subfile that was extracted from another subfile would have the value "-1". The PriorityAttributePrioritizer uses these values to prioritize processing subfiles before extracting new containers.

When you set Create new FlowFiles from sub-files to false, the processor automatically extracts subfiles recursively. In this case the subfile relationship does not exist.