Configuration File Settings

The Eduction ACI Server requires a configuration file, which identifies settings required for the server to run. The default configuration file is eductionserver.cfg. You can change the configuration file by using command-line options. See Command-Line Options.

See Eduction Parameters for more information.

NOTE:

Some configuration settings affect extraction speed, for example MatchCase, MatchWholeWords, AllowOverlaps, and NonGreedyMatch. For more information on factors that can affect Eduction performance, refer to IDOL Expert.

Eduction ACI Server Configuration Parameters
Required Parameters
[Eduction]
ResourceFiles=GrammarFile[,GrammarFile2]...

The name of one or more Eduction grammar files to load. The grammar files contain the entities to use to match text.

You can specify multiple grammar names as a comma-separated list, by using wildcard expressions, or by specifying a comma-separated list of multiple wildcard expressions. There must be no space before or after a comma.

NOTE:

Because all entities loaded to an engine must have unique names, you must not load the same grammar file more than once.

EntityN=entity

The name of an Eduction entity, contained in the loaded grammar files. When you pass text to Eduction, it matches on the entity.

N must start at zero, and increase by one for each entity you list. For example, Entity0, Entity1, and so on.

This parameter supports wildcard expressions.

NOTE:

All entities must have unique names.

Recommended Parameters
[Eduction]
AllowDuplicates

A list of IDOL fields in which to allow duplicate entities.

You can enter multiple fields separated by a comma.

AllowMultipleResults

Set this parameter to All or True to allow Eduction to return multiple matches starting at the same offset.

Set this parameter to No or False to allow Eduction to return only one match starting at each offset.

Set this parameter to OnePerEntity to allow Eduction to return up to one match starting at the same offset per entity.

AllowOverlaps=[False|True] Whether to allow Eduction to return more than one match when multiple matches involve overlapping text.
CantHaveFieldCSVs Names of fields that Eduction ignores when reading an XML file. Allows you to specify the fields in documents that are discarded before the documents are stored.
CaseNormalization=[Upper|Lower] Converts the case of incoming text to all uppercase or all lowercase. By default, Eduction does not perform any case conversions.
CaseSensitiveFieldName Whether to preserve the case sensitivity of configured field names. By default, the Eduction module converts all field names to uppercase when it produces matches. Set this parameter to True to preserve the case of the field names. This option makes field names case sensitive.
CJKNormalization

Whether to normalize Chinese, Japanese, and Korean data before extraction. Set this parameter to one of the following:

  • Kana
  • OldNew
  • Number
  • HWNum
  • HWAlpha
  • SimpChi
  • FWJamo

See CJKNormalization for more information.

Databases

The names of the databases to which a document belongs. Eduction runs only on documents that belong to the comma-separated list of databases. If you do not list databases, Eduction is run on documents from all databases.

DocumentDelimiterCSVs File fields (tags) marking the start and end of a document.
EnableComponents=[False|True]

Whether to output component details in the match results, when grammars define components in an entity.

This parameter is used by edktool only, and is ignored by Eduction.

EnableUniqueMatches

Whether to return only unique matches in each document. If you set this parameter to True, Eduction returns a single occurrence of a particular value. If the same value occurs more than once, only the first instance is returned, even if the matches occur for different entities.

EntityAdvancedFieldN

A comma-separated list of advanced fields to return.

To use this option you must:

  • set OutputSimpleMatch to False for edktool.
  • set EnableComponents to True for edktool.
  • define components in the entity definition.
EntityComponentFieldN

A comma-separated list of entity components that you want to return as fields.

To use this option you must:

  • set OutputSimpleMatch to False for edktool.
  • set EnableComponents to True for edktool.
  • define components in the entity definition.
EntityMatchRangeN

A range of matching instances of the entity that are returned. The entity match range number N must match the corresponding EntityN number. The format of the range is as follows:

<match>\[{-\|,}<match>\]\[,...\]*

EntityMinScoreN Matches only items with scores equal to or exceeding the threshold. The entity minimum score number N must match the corresponding EntityN number.
EntitySearchFieldsN

Specifies the IDX fields to use for an entity. Matches for an entity are returned only if they occur in one of the specified fields.

MatchCase=[False|True] Whether matching is case-sensitive. The default value is True.
LanguageDirectory

Enables tokenization of some languages using sentence breaking libraries. Set LanguageDirectory to the path of an IDOL Server language directory that contains the relevant sentence breaking libraries and associated data files.

Locale

Enables tokenization of some languages using sentence breaking libraries. Set Locale to one of CHI, JPN, KOR, or THA.

NOTE:

The standard grammar files are developed without this setting; HPE recommends that you use this parameter only when you are using custom grammar files that have been developed with the specific tokenization.

MatchWholeWord=[False|True] Whether to allow matching on parts of a word. The default value is False, which means that matching is performed only on whole words, for example, part does not return from partake.
MaxEntityLength=Length

The maximum number of bytes that a match can have. The default value is 256.

MaxMatchesPerDoc

The maximum number of matches to allow in each document.

NonGreedyMatch

Whether to return the shortest match. Set NonGreedyMatch to True to configure the Eduction module to return the shortest match.

Setting this parameter to True implicitly disables the AlowOverlaps and AllowMultipleResults parameters. If you have set these parameters, NonGreedyMatch takes precedence.

OutputSimpleMatchInfo

Setting OutputSimpleMatchInfo to True generates basic match information only, such as document, entity, position, and original text.

If OutputSimpleMatchInfo=True, the EnableComponents setting has no effect and reverts to False.

This parameter is used by edktool only, and is ignored by Eduction.

RedactionOutputString A string to include in the output in place of redacted text. If neither RedactionOutputString nor RedactionReplacementCharacter is set, Eduction uses the default value of [redacted].
RedactionReplacementCharacter A character to include in the output in place of each character in a passage of redacted text.
SearchFields

A comma-separated list of IDOL fields to search for entities. You can search the following IDOL fields:

  • DREREFERENCE
  • DRETITLE
  • SUMMARY
  • DRECONTENT

You can also add any customized fields present in the IDOL database to this list. You must specify at least one field to search, or no results return.

SuppressMatchLogging

Set this parameter to True to suppress log entries for every entity and zone pattern found in a document.

When logging is set to Full in the IDOL configuration file, Eduction makes a log entry for every entity and zone pattern found in a document. If you set this parameter to True, these log entries are suppressed. This option is useful when you want to log the performance timing information, but do not want the verbose match entries.

You can also set this parameter in Eduction Server. If you set logging to Full in the Eduction Server configuration file, the server records a log entry for every entity match found. You can set SuppressMatchLogging to True to suppress these log entries.

TangibleCharacters A list of non-alphanumeric characters to make searchable.
TokenWithPunctuation Whether to treat all punctuation characters as part of a word token, rather than treating them as word boundaries. Setting this parameter to True is equivalent to setting the TangibleCharacters parameter to all punctuation characters.
[PostProcessingTasks]
NumTasks The number of post-processing tasks that you want to configure.

TaskN

The name of the individual post-processing task that you want to configure.

[MyPostProcessingTask]
Entities

A list of the extracted entities that you want to use your post-processing script to modify.

This parameter supports wildcard expressions.

ProcessEnMasse Set ProcessEnMasse to True to set up an en-masse post-processing task.
Script

The path to the script to use for your pre-processing task.

Example Configuration File

[License]
LicenseServerHost=127.0.0.1
LicenseServerACIPort=20000

[Server]
Port=7075
Threads=1

[Eduction]
ResourceFiles=person_name_engus.ecr
Entity0=person/femalefirstname/engus
Entity1=person/malefirstname/engus
Entity2=person/lastname/engus
//MinScore=0.5
//MaxEntityLength=12
//MatchCase=0
//CaseNormalization=Lower
//AllowOverlaps=1
//EnableComponents=1
//MatchWholeWord=0
//RedactionOutputString=[censored]

[Logging]
LogLevel=full
0=ApplicationLogStream

[ApplicationLogStream]
LogFile=application.log

_HP_HTML5_bannerTitle.htm