Eduction

The Eduction processor uses IDOL Eduction to extract entities from text. An entity is a word, phrase, or block of information. For example, you can use Eduction to extract names, addresses, telephone numbers, and dates from document content or metadata.

For more information about Eduction and how to configure Eduction, refer to the Eduction User and Programming Guide.

TIP: The Eduction processor includes the resource files available with IDOL Eduction (excluding those for matching personal data that are provided separately in the PCI, PHI, and PII packages). This means that for properties such as Resource Files or [MyPostProcessingTask] Script you can enter the name of a Micro Focus grammar file or Lua script, such as address_eng.ecr or normalize_money.lua. The processor will use the corresponding file from its resources directory.

Properties

Name Default Value Description
IDOL License Service   An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server.
Entity  

A list of entities to extract. Specify a comma-separated list of values, or one value per line.

To specify several entities, you can use wildcard expressions. For example: place/city1/*,place/city2/*. The * wildcard matches any number of characters, and the ? wildcard matches a single character.

You must also set the Resource Files property to the location of the resource files that contain your chosen entities.

Entity Field   A list of document fields in which to write the matches from Eduction. The value of this property must have the same number of values as the Entity property. Specify a comma-separated list of values, or one value per line.
Resource Files   A list of compiled ECR files containing Eduction grammar entries. At least one resource file is required. You can match multiple resource files with wildcard expressions. You can use the * wildcard to match any number of characters, or the ? wildcard to match a single character. Specify a comma-separated list of values, or one value per line.
Search Fields DRECONTENT A list of document fields to search for entities, for example DRECONTENT or DRETITLE. Specify a comma-separated list of values, or one value per line.
Simple Output False A Boolean value that specifies whether to add only the matched text to the document fields specified by the Entity Field property. To add only the matched text, set this parameter to true. With the default value, false, the fields will have subfields that contain the matched text, the offset, and the score.
eduction_configuration_parameter_name  

For more information about the configuration parameters that you can use to configure Eduction, see Eduction Configuration Parameters. Some of these must be set as dynamic properties.

Some properties, for example PostProcessingTaskN, accept the name of a configuration section (so that you can configure more than one post processing task). When you configure associated properties, prefix the property names with the section name in square brackets. For example, if you set PostProcessingTask to MyTask, specify the associated script by setting a property named [MyTask]Script.

Relationships

Name Description
success Successfully processed FlowFiles are routed to this relationship.
failure FlowFiles that had an invalid or unknown format.