AutoCategorizeLabeller

A processor that determines which categories, if any, to apply to a particular input document. This processor adds a metadata field to a document for every category that the processor determines the document belongs to.

The processor holds documents in a queue, waiting to successfully identify the categories. If the processor successfully identifies categories for these documents, it processes and labels the documents. If not, it discards the documents.

Properties

Name Default Value Description
Category Host   The host of the IDOL Category component to use to categorize documents.
Category Port   The port of the IDOL Category component to use to categorize documents.
Min Weight 50 The minimum weight/score required for a category result for a document for it to be regarded as a match.
Distributed Cache Service   The identifier of the Distributed Cache Service used to communicate state between the AutoCategorizeGenerator processor and the AutoCategorizeLabeller processor.

NOTE: The template does not include the DistributedMapCacheServer controller service that is required to make it work. You must set one of these up yourself in your NiFi ingestion process, and configure both processors to use it.

The DistributedMapCacheServer is a standard NiFi service. For more information, see https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-distributed-cache-services-nar/1.5.0/org.apache.nifi.distributed.cache.server.map.DistributedMapCacheServer/index.html.

Relationships

Name Description
success Successfully processed FlowFiles are routed to this relationship; that is, when the processor successfully adds a category label.