HashDocument

The HashDocument processor calculates one or more hashes and adds the values to the document. Hash values can be used to identify duplicate or related documents. For example, you could calculate a hash from the sender, recipients, and subject of each e-mail, in order to create groups of related messages.

The processor calculates a separate hash for:

  • The document content. The value is added to the field hashes/MD5/content or hashes/SHA1/content.
  • The document metadata (or selected fields from the document metadata). The value is added to the field hashes/MD5/metadata or hashes/SHA1/metadata.
  • Each file that is associated with the document. The value is added to the field part/hashes/MD5 or part/hashes/SHA1.

Properties

Name Default Value Description
IDOL License Service  

An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server.

XPath //* An XPath expression that identifies the fields to include when calculating the hash for the document metadata. If you enter a blank value, the processor does not calculate a hash for the document metadata.
Hash document content true A Boolean value that specifies whether to calculate a hash from the document content.
Hash document files true A Boolean value that specifies whether to calculate a hash for each associated file.
Hash Type MD5 The hash algorithm to use (MD5 or SHA1).

Relationships

Name Description
success Successfully processed FlowFiles are routed to this relationship.
failure FlowFiles that had an invalid or unknown format.