KeyViewFilterDocument
A processor that uses KeyView to filter metadata and text from files.
The processor reads any files embedded in the FlowFile or referenced by the FlowFile, and passes these to KeyView. The metadata that is extracted by KeyView is added to the FlowFile as XML metadata. The text that is extracted by KeyView is appended to the content parts of the FlowFile.
Properties
Name | Default Value | Description |
---|---|---|
KeyView Filter Service | A KeyViewFilterServiceImpl that manages the location of the KeyView binaries and temporary file storage. | |
IDOL License Service | An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server. | |
Extract content text | True | Specifies whether to filter text from the main content of a file and add it to the IDOL document content. |
Extract file metadata | True | Specifies whether to filter metadata from a file and add it to the IDOL document metadata. |
Filter Metadata XPath |
An XPath expression that matches fields in the IDOL document metadata, that you want to filter. The processor performs additional processing to extract plain text. The resulting value is written to the location specified by the "Filtered Metadata Target" property. This property is useful when a field contains text with RTF formatting. |
|
Filter HTML Metadata XPath |
An XPath expression that matches fields in the IDOL document metadata, that you want to filter. The processor removes HTML markup and keeps only the text. The resulting value is written to the location specified by the "Filtered Metadata Target" property. This property is useful when data repositories (such as Microsoft SharePoint) return metadata fields that contain HTML, but you do not want to index HTML into IDOL. |
|
Filtered Metadata Target | original_field | Specifies where to write text that was filtered from metadata fields specified by "Filter Metadata XPath" and "Filter HTML Metadata XPath". You can overwrite the original field or add the filtered text to the IDOL document content. |
Relationships
Name | Description |
---|---|
success | Successfully processed FlowFiles are routed to this relationship (regardless of whether content or metadata is extracted). |
failure | FlowFiles with an invalid or unknown format. |