HeaderEntityN

The entities to extract for the header rows of input tables. This parameter allows you to extract entities from structured data.

When matching CSV or TSV input, Eduction matches the first non-empty row of input against these configured header entities. You can use this to extract landmark values that describe something that you want to find in a table column. You specify the entity to extract from the other cells by setting CellEntityN

TIP: You can optionally configure Eduction to search additional rows for the header entities by setting MaxSearchHeaderRow.

For example:

HeaderEntity0=pii/date/dob/landmark/all
CellEntity0=pii/date/nocontext/all

This example matches date of birth landmark values in the header, and for all subsequent rows in that column, it extracts any date values.

NOTE: The IDOL PII Package, IDOL PHI Package, and IDOL PCI Package, provide landmark entities in most grammars. To extract entities from tables with the Eduction standard grammar files, you might need to create your own landmark entities.

You can specify multiple entities in a comma-separated list. If the table header matches any of the configured header entities, Eduction matches the cell content against any of the configured cell entities. This option might be useful if you want to match a particular entity in multiple languages, or if you want to include a custom entity in addition to a standard one.

You can also use wildcard expressions in the entity names. The * wildcard matches any number of characters, and the ? wildcard matches a single character.

For more information about table extraction, refer to the Eduction User and Programming Guide.

Type: String
Default: None
Required:

No

Configuration Section: Eduction
Example:
HeaderEntity0=pii/date/dob/landmark/all
CellEntity0=pii/date/nocontext/all
See Also:

HeaderEntityMatchLimitN

CellEntityN

MaxSearchHeaderRow