The HashDocument processor calculates one or more hashes and adds the values to the document. Hash values can be used to identify duplicate or related documents. For example, you could calculate a hash from the sender, recipients, and subject of each e-mail, in order to create groups of related messages.
The processor calculates a separate hash for:
hashes/MD5/content
or hashes/SHA1/content
.hashes/MD5/metadata
or hashes/SHA1/metadata
.part/hashes/MD5
or part/hashes/SHA1
.Name | Default Value | Description |
---|---|---|
XPath | //* | An XPath expression that identifies the fields to include when calculating the hash for the document metadata. If you enter a blank value, the processor does not calculate a hash for the document metadata. |
Hash document content | true | A Boolean value that specifies whether to calculate a hash from the document content. |
Hash document files | true | A Boolean value that specifies whether to calculate a hash for each associated file. |
Hash Type | MD5 | The hash algorithm to use (MD5 or SHA1 ). |
Name | Description |
---|---|
success | Successfully processed FlowFiles are routed to this relationship. |
failure | FlowFiles that had an invalid or unknown format. |