The Eduction ACI Server requires a configuration file, which identifies settings required for the server to run. The default configuration file is eductionserver.cfg
. You can change the configuration file by using command-line options. See Command-Line Options.
See Eduction Parameters for more information.
Some configuration settings affect extraction speed, for example MatchCase
, MatchWholeWords
, AllowOverlaps
, and NonGreedyMatch
. For more information on factors that can affect Eduction performance, refer to IDOL Expert.
Eduction ACI Server Configuration Parameters | |
---|---|
Required Parameters | |
[License] | |
LicenseServerHost=IP address
|
The IP address where the License server is running. If you use LicenseServerHost , you must also specify LicenseServerACIPort . If you do not specify these parameters, you must use Key and Holder . |
LicenseServerACIPort=port
|
The port on the machine where the IDOL License server is running. If you use |
Key=LicenseKey
|
The license key string. If you use Key , you must also set Holder . If you do not specify these parameters, you must use LicenseServerHost and LicenseServerACIPort . |
Holder=LicenseKeyHolder
|
The holder of the license key string. This value is a multipart string, which describes a group, machine, and MAC address. If you use |
[Server] | |
Port=Port
|
The port on which the Eduction ACI Server listens for requests. |
[Eduction] | |
ResourceFiles=GrammarFile[,GrammarFile2]...
|
The name of one or more Eduction grammar files to load. The grammar files contain the entities to use to match text. You can specify multiple grammar names as a comma-separated list, by using wildcard expressions, or by specifying a comma-separated list of multiple wildcard expressions. There must be no space before or after a comma. NOTE:
Because all entities loaded to an engine must have unique names, you must not load the same grammar file more than once. |
EntityN=entity
|
The name of an Eduction entity, contained in the loaded grammar files. When you pass text to Eduction, it matches on the entity.
This parameter supports wildcard expressions. NOTE:
All entities must have unique names. |
Recommended Parameters | |
[Logging] | |
LogLevel=LogLevel
|
The level of details for logging. The options are |
[Server] | |
Threads=ACIThreads
|
The number of threads that can accept concurrent Eduction ACI server requests. The default value is |
[Eduction] | |
AllowDuplicates
|
A list of IDOL fields in which to allow duplicate entities. You can enter multiple fields separated by a comma. |
AllowMultipleResults
|
Set this parameter to Set this parameter to Set this parameter to |
AllowOverlaps=[False|True]
|
Whether to allow Eduction to return more than one match when multiple matches involve overlapping text. |
CaseNormalization=[Upper|Lower]
|
Converts the case of incoming text to all uppercase or all lowercase. By default, Eduction does not perform any case conversions. |
CaseSensitiveFieldName
|
Whether to preserve the case sensitivity of configured field names. By default, the Eduction module converts all field names to uppercase when it produces matches. Set this parameter to True to preserve the case of the field names. This option makes field names case sensitive. |
CJKNormalization
|
Whether to normalize Chinese, Japanese, and Korean data before extraction. Set this parameter to one of the following:
See CJKNormalization for more information. |
Databases
|
The names of the databases to which a document belongs. Eduction runs only on documents that belong to the comma-separated list of databases. If you do not list databases, Eduction is run on documents from all databases. |
EnableComponents=[False|True]
|
Whether to output component details in the match results, when grammars define components in an entity. This parameter is used by |
EnableUniqueMatches
|
Whether to return only unique matches in each document. If you set this parameter to |
EntityAdvancedFieldN
|
A comma-separated list of advanced fields to return. To use this option you must:
|
EntityComponentFieldN
|
A comma-separated list of entity components that you want to return as fields. To use this option you must:
|
EntityMatchRangeN
|
A range of matching instances of the entity that are returned. The entity match range number
|
EntityMinScoreN
|
Matches only items with scores equal to or exceeding the threshold. The entity minimum score number N must match the corresponding EntityN number. |
EntitySearchFieldsN
|
Specifies the IDX fields to use for an entity. Matches for an entity are returned only if they occur in one of the specified fields. |
MatchCase=[False|True]
|
Whether matching is case-sensitive. The default value is True . |
LanguageDirectory
|
Enables tokenization of some languages using sentence breaking libraries. Set |
Locale
|
Enables tokenization of some languages using sentence breaking libraries. Set NOTE:
The standard grammar files are developed without this setting; HPE recommends that you use this parameter only when you are using custom grammar files that have been developed with the specific tokenization. |
MatchWholeWord=[False|True]
|
Whether to allow matching on parts of a word. The default value is False , which means that matching is performed only on whole words, for example, part does not return from partake. |
MaxEntityLength=Length
|
The maximum number of bytes that a match can have. The default value is |
MaxMatchesPerDoc
|
The maximum number of matches to allow in each document. |
NonGreedyMatch
|
Whether to return the shortest match. Set Setting this parameter to |
OutputSimpleMatchInfo
|
Setting If This parameter is used by |
RedactionOutputString
|
A string to include in the output in place of redacted text.
If neither RedactionOutputString nor RedactionReplacementCharacter is set, Eduction uses the default value of [redacted] . |
RedactionReplacementCharacter
|
A character to include in the output in place of each character in a passage of redacted text. |
SearchFields
|
A comma-separated list of IDOL fields to search for entities. You can search the following IDOL fields:
You can also add any customized fields present in the IDOL database to this list. You must specify at least one field to search, or no results return. |
SuppressMatchLogging
|
Set this parameter to When logging is set to You can also set this parameter in Eduction Server. If you set logging to |
TangibleCharacters
|
A list of non-alphanumeric characters to make searchable. |
TokenWithPunctuation
|
Whether to treat all punctuation characters as part of a word token, rather than treating them as word boundaries. Setting this parameter to True is equivalent to setting the TangibleCharacters parameter to all punctuation characters. |
[PostProcessingTasks] | |
NumTasks
|
The number of post-processing tasks that you want to configure. |
|
The name of the individual post-processing task that you want to configure. |
[MyPostProcessingTask] | |
Entities
|
A list of the extracted entities that you want to use your post-processing script to modify. This parameter supports wildcard expressions. |
ProcessEnMasse
|
Set ProcessEnMasse to True to set up an en-masse post-processing task. |
Script
|
The path to the script to use for your pre-processing task. |
[License] LicenseServerHost=127.0.0.1 LicenseServerACIPort=20000 [Server] Port=7075 Threads=1 [Eduction] ResourceFiles=person_name_engus.ecr Entity0=person/femalefirstname/engus Entity1=person/malefirstname/engus Entity2=person/lastname/engus //MinScore=0.5 //MaxEntityLength=12 //MatchCase=0 //CaseNormalization=Lower //AllowOverlaps=1 //EnableComponents=1 //MatchWholeWord=0 //RedactionOutputString=[censored] [Logging] LogLevel=full 0=ApplicationLogStream [ApplicationLogStream] LogFile=application.log
|