IDOL Eduction Grammars
The following section describes the Eduction grammars available in the IDOL PCI Package.
You can use these grammars with IDOL Eduction, by using Eduction Server, the edktool command-line utility, or the Eduction SDK. For more information, refer to the IDOL Eduction User Guide and the Eduction SDK Programming Guide.
IMPORTANT: To use the Eduction grammars in the IDOL PCI Package, you must have a license that enables them. To obtain a license, contact Micro Focus Support.
The IDOL PCI Package includes a default configuration file, which includes the basic required settings that you need to use the PCI grammars.
NOTE: If you create your own configuration file, you must include some of the settings in the default configuration file, such as post-processing and Eduction components (see Configure Post Processing).
Configure Post Processing
When you use the IDOL PCI Package Eduction grammars it is essential to configure a Lua post-processing task to run the script pci_postprocessing.lua
. This script contains post-processing to improve results for various entities, such as stop list filtering, and checksum validation (see Validated ID Numbers).
IMPORTANT: If you do not run this script, you might encounter unexpected behavior.
The default configuration file provided in the IDOL PCI Package includes a suitable post-processing task. If you use a different configuration, you must add the post-processing task to your Eduction configuration. For example:
[Eduction] PostProcessingTask0=MyPostProcessingSection [MyPostProcessingSection] Type=Lua Script=scripts/pci_postprocessing.lua Entities=pci/*
IMPORTANT: The post-processing script requires Eduction components (see Components). The default PCI configuration file enables components. If you use a custom configuration file you must set the EnableComponents
parameter to True
to return components.
For more information about configuring post-processing tasks, refer to the Eduction User and Programming Guide.
Entity Context
Some of the entities are available in two versions, with and without context. The context-based entities match the entity when it occurs in an easily identifiable location in text. For example, it might match a telephone number that occurs next to the prefix Phone:.
The entities that do not have context attempt to match the entity wherever it occurs. This version might over-match significantly (that is, it is likely to return values that are similar to the entity patterns, such a number that is not a telephone number). However, it also reduces the number of false negatives (that is, it misses fewer matches).
You can configure Eduction to use both versions of an entity; matches located with context are given a higher score in the results.
When you have data in tables, the context for an entity might not occur next to the entity value. For example, you might have a table with columns titled name and date of birth, but the values themselves do not occur next to these headers.
In this case, you can use Eduction table extraction to extract entities according to the landmarks detected in the table headers. For example, you can configure Eduction so that if it finds a table heading that matches the landmark date of birth, it extracts dates from that column.
For more information about how to configure table extraction, refer to the Eduction User and Programming Guide.