HashDocument
The HashDocument processor calculates one or more hashes and adds the values to the document. Hash values can be used to identify duplicate or related documents. For example, you could calculate a hash from the sender, recipients, and subject of each e-mail, in order to create groups of related messages.
The processor calculates a separate hash for:
- The document content. The value is added to the field
hashes/MD5/content
orhashes/SHA1/content
. - The document metadata (or selected fields from the document metadata). The value is added to the field
hashes/MD5/metadata
orhashes/SHA1/metadata
. - Each file that is associated with the document. The value is added to the field
part/hashes/MD5
orpart/hashes/SHA1
.
Properties
Name | Default Value | Description |
---|---|---|
IDOL License Service |
An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server. |
|
XPath | //* | An XPath expression that identifies the fields to include when calculating the hash for the document metadata. If you enter a blank value, the processor does not calculate a hash for the document metadata. |
Hash document content | true | A Boolean value that specifies whether to calculate a hash from the document content. |
Hash document files | true | A Boolean value that specifies whether to calculate a hash for each associated file. |
Hash Type | MD5 | The hash algorithm to use (MD5 or SHA1 ). |
Relationships
Name | Description |
---|---|
success | Successfully processed FlowFiles are routed to this relationship. |
failure | FlowFiles that had an invalid or unknown format. |