Use a Compilation Configuration File
When you compile a grammar by using edktool, you can add an optional JSON configuration file to specify additional options for compilation.
Configure Character Expansions
You can configure character expansions, which detect certain characters as if they are a different character. For example, you can detect different varieties of punctuation characters to match a standard form that you use in your grammar files.
To use character expansions, you specify an expansions
array which contains a list of your expansions. Each array item has a src
and dest
element:
-
src
. The source character. Eduction detects the destination characters as if they are this source character. In the output text, Eduction normalizes all the destination characters to the source character. -
dest
. An array of destination characters that you want to detect as the source character.
The following example configuration matches the letters b and c as if they are the letter a:
{ "expansions": [ { "src": "a", "dest": ["b", "c"] } ] }
When your grammar includes the following pattern:
<pattern>ade</pattern>
And your input contains the following text:
ade bde cde dde
Eduction matches ade
, bde
, and cde
, as if they are ade
, and produces the following normalized matches:
ade ade ade
Optimize Case-Insensitive Matching
When you have a grammar file that contains case-sensitive entities, but you want to find matches regardless of case, you can run Eduction with the parameter MatchCase=False
. When Eduction loads a grammar file and MatchCase=False
, it optimizes the entities for case-insensitive matching to improve run-time performance. However, this can increase the time required to initialize Eduction, so if you regularly use a grammar file with MatchCase=False
you can optimize the entities for case-insensitive matching at compile-time instead.
To optimize entities for case-insensitive matching, set the option alternativeCaseArcs
to true
in your compilation configuration file:
{ "alternativeCaseArcs": true }
After compiling a grammar file with this option, you can still use the MatchCase
parameter to choose whether matching is case-sensitive.
To obtain the best case-insensitive performance, you should write a grammar file using only upper or lower case and then normalize the input by setting the CaseNormalization
parameter. For more information, see Case Normalization. Compiling a grammar file with alternativeCaseArcs
set to true
is useful if you cannot easily modify your grammar file(s), but only reduces the time required to initialize Eduction (it does not reduce the time required for matching).
To recompile an existing grammar file with alternativeCaseArcs
set to true
you could include the existing grammar in a new grammar file as shown in the example below, and then compile the new grammar using edktool
.
<?xml version="1.0" encoding="UTF-8"?> <grammars version="4.0"> <include path="published/grammar.ecr" type="public"/> </grammars>
Use the Configuration File
You add a configuration file to your compilation by setting the -c command-line option in the compile command. For more information, see Compile.
When you compile a grammar by using the Eduction SDK, you can specify the path to a compilation configuration file by using one of the following options:
-
C API: the
EdkLoadResourceFileWithCompileConfig
andEdkLoadResourceBufferWithCompileConfig
functions. -
Java API: the
loadResourceFile
,loadResourceFiles
, andloadResourceBuffer
methods in theTextExtractionEngine
interface. -
.NET API: the
GetCompiler
method on theEDKFactory
class.
For more information, refer to the API documentation.