KeyViewFilterDocument

A processor that uses KeyView to filter metadata and text from files.

The processor reads any files embedded in the FlowFile or referenced by the FlowFile, and passes these to KeyView. The metadata that is extracted by KeyView is added to the FlowFile as XML metadata. The text that is extracted by KeyView is appended to the content parts of the FlowFile.

Properties

Name Default Value Description
KeyView Filter Service   A KeyViewFilterServiceImpl that manages the location of the KeyView binaries and temporary file storage.
IDOL License Service   An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server.
Extract content text True Specifies whether to filter text from the main content of a file and add it to the IDOL document content.
Extract file metadata True Specifies whether to filter metadata from a file and add it to the IDOL document metadata.
Filter Metadata XPath  

An XPath expression that matches fields in the IDOL document metadata, that you want to filter. The processor performs additional processing to extract plain text. The resulting value is written to the location specified by the "Filtered Metadata Target" property.

This property is useful when a field contains text with RTF formatting.

Filter HTML Metadata XPath  

An XPath expression that matches fields in the IDOL document metadata, that you want to filter. The processor removes HTML markup and keeps only the text. The resulting value is written to the location specified by the "Filtered Metadata Target" property.

This property is useful when data repositories (such as Microsoft SharePoint) return metadata fields that contain HTML, but you do not want to index HTML into IDOL.

Filtered Metadata Target original_field Specifies where to write text that was filtered from metadata fields specified by "Filter Metadata XPath" and "Filter HTML Metadata XPath". You can overwrite the original field or add the filtered text to the IDOL document content.

Relationships

Name Description
success Successfully processed FlowFiles are routed to this relationship (regardless of whether content or metadata is extracted).
failure FlowFiles with an invalid or unknown format.