HeaderEntityN
The entities to extract for the header rows of input tables. This parameter allows you to extract entities from structured data.
When matching CSV or TSV input, Eduction matches the first non-empty row of input against these configured header entities. You can use this to extract landmark values that describe something that you want to find in a table column. You specify the entity to extract from the other cells by setting CellEntityN
TIP: You can optionally configure Eduction to search additional rows for the header entities by setting MaxSearchHeaderRow.
For example:
HeaderEntity0=pii/date/dob/landmark/all CellEntity0=pii/date/nocontext/all
This example matches date of birth landmark values in the header, and for all subsequent rows in that column, it extracts any date values.
NOTE: The IDOL PII Package, IDOL PHI Package, and IDOL PCI Package, provide landmark entities in most grammars. To extract entities from tables with the Eduction standard grammar files, you might need to create your own landmark entities.
You can specify multiple entities in a comma-separated list. If the table header matches any of the configured header entities, Eduction matches the cell content against any of the configured cell entities. This option might be useful if you want to match a particular entity in multiple languages, or if you want to include a custom entity in addition to a standard one.
You can also use wildcard expressions in the entity names. The * wildcard matches any number of characters, and the ? wildcard matches a single character.
For more information about table extraction,
Type: | String |
Default: | None |
Required: |
No |
Configuration Section: | Eduction |
Example: | HeaderEntity0=pii/date/dob/landmark/all CellEntity0=pii/date/nocontext/all |
See Also: |