IDOL Eduction Grammars
The following section describes the Eduction grammars available in the IDOL PCI Package.
You can use these grammars with IDOL Eduction, by using Eduction Server, the edktool command-line utility, or the Eduction SDK. For more information, refer to the IDOL Eduction User Guide and the Eduction SDK Programming Guide.
IMPORTANT: To use the Eduction grammars in the IDOL PCI Package, you must have a license that enables them. To obtain a license, contact OpenText Support.
The IDOL PCI Package includes a default configuration file, which includes the basic required settings that you need to use the PCI grammars.
NOTE: If you create your own configuration file, you must include some of the settings in the default configuration file, such as post-processing and Eduction components (see Configure Post Processing).
Configure Post Processing
When you use the IDOL PCI Package Eduction grammars it is essential to configure a Lua post-processing task to run the script pci_postprocessing.lua
. This script contains post-processing to improve results for various entities, such as stop list filtering, and checksum validation (see Validated ID Numbers).
IMPORTANT: If you do not run this script, you might encounter unexpected behavior.
The default configuration file provided in the IDOL PCI Package includes a suitable post-processing task. If you use a different configuration, you must add the post-processing task to your Eduction configuration. For example:
[Eduction] PostProcessingTask0=MyPostProcessingSection [MyPostProcessingSection] Type=Lua Script=scripts/pci_postprocessing.lua Entities=pci/*
IMPORTANT: The post-processing script requires Eduction components (see Components). The default PCI configuration file enables components. If you use a custom configuration file you must set the EnableComponents
parameter to True
to return components.
For more information about configuring post-processing tasks, refer to the Eduction User and Programming Guide.
Configure Pre-Filtering
Pre-filtering allows the IDOL PCI Package to run a quick initial check to find potential matches in your input text. It then selects match windows around these potential matches, reducing the amount of text that it must match against your grammars. This process can improve the performance in certain cases.
NOTE: Pre-filter tasks run for all configured entities, so you must configure it only for the appropriate entities to ensure that it does not affect the results for other entities.
The IDOL PCI Package includes sample pre-filter configuration files for the name grammars, including dictionary pre-filter files where they are required by the sample configuration.
IMPORTANT: To use the DPF files from the 24.2 package, you must use Eduction tools with a version of 12.9 or later.
For more information about pre-filtering, refer to the Eduction User and Programming Guide.
Entity Context
Some of the entities are available in two versions, with and without context. The context-based entities match the entity when it occurs in an easily identifiable location in text. For example, it might match a telephone number that occurs next to the prefix Phone:.
The entities that do not have context attempt to match the entity wherever it occurs. This version might over-match significantly (that is, it is likely to return values that are similar to the entity patterns, such a number that is not a telephone number). However, it also reduces the number of false negatives (that is, it misses fewer matches).
You can configure Eduction to use both versions of an entity; matches located with context are given a higher score in the results.
When you have data in tables, the context for an entity might not occur next to the entity value. For example, you might have a table with columns titled name and date of birth, but the values themselves do not occur next to these headers.
In this case, you can use Eduction table extraction to extract entities according to the landmarks detected in the table headers. For example, you can configure Eduction so that if it finds a table heading that matches the landmark date of birth, it extracts dates from that column.
For more information about how to configure table extraction, refer to the Eduction User and Programming Guide.
Grammar Metadata
The grammar files in the IDOL PCI Package come with metadata JSON files, which include some information about the grammars, such as supported countries and languages. You can use these metadata files to make it easier to select entities when you configure Eduction. For example, you might want to use them to create a user interface where your end users can select entities according to which languages it covers.
Each grammar has its own metadata file, which covers all entities for that grammar. These metadata files are included in the IDOL PCI Package alongside the relevant grammar file. A metadata_schema.json
file is also available, which contains the schema for the metadata JSON files.