Eduction
The Eduction processor uses IDOL Eduction to extract entities from text. An entity is a word, phrase, or block of information. For example, you can use Eduction to extract names, addresses, telephone numbers, and dates from document content or metadata.
For more information about Eduction and how to configure Eduction, refer to the Eduction User and Programming Guide.
TIP: The Eduction processor includes the resource files available with IDOL Eduction (excluding those for matching personal data that are provided separately in the PCI, PHI, and PII packages). This means that for properties such as Resource Files or [MyPostProcessingTask] Script you can enter the name of a OpenText grammar file or Lua script, such as address_eng.ecr
or normalize_money.lua
. The processor will use the corresponding file from its resources directory.
Properties
Name | Default Value | Description |
---|---|---|
IDOL License Service | An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server. | |
Entity |
A list of entities to extract. Specify a comma-separated list of values, or one value per line. To specify several entities, you can use wildcard expressions. For example: You must also set the Resource Files property to the location of the resource files that contain your chosen entities. |
|
Entity Field | A list of document fields in which to write the matches from Eduction. The value of this property must have the same number of values as the Entity property. Specify a comma-separated list of values, or one value per line. | |
Resource Files | A list of compiled ECR files containing Eduction grammar entries. At least one resource file is required. You can match multiple resource files with wildcard expressions. You can use the * wildcard to match any number of characters, or the ? wildcard to match a single character. Specify a comma-separated list of values, or one value per line. |
|
Search Fields | DRECONTENT | A list of document fields to search for entities, for example DRECONTENT or DRETITLE . Specify a comma-separated list of values, or one value per line. |
Simple Output | False | A Boolean value that specifies whether to add only the matched text to the document fields specified by the Entity Field property. To add only the matched text, set this parameter to true. With the default value, false, the fields will have subfields that contain the matched text, the offset, and the score. |
eduction_configuration_parameter_name |
For more information about the configuration parameters that you can use to configure Eduction, see Eduction Configuration Parameters. Some of these must be set as dynamic properties. Some properties, for example |
Relationships
Name | Description |
---|---|
success | Successfully processed FlowFiles are routed to this relationship. |
failure | FlowFiles that had an invalid or unknown format. |