Use a Compilation Configuration File

When you compile a grammar by using edktool, you can add an optional JSON configuration file to specify additional options for compilation.

Configure Character Expansions

You can configure character expansions, which detect certain characters as if they are a different character. For example, you can detect different varieties of punctuation characters to match a standard form that you use in your grammar files.

To use character expansions, you specify an expansions array which contains a list of your expansions. Each array item has a src and dest element. The source and destination characters should be considered as a single list where any character in the list is expanded to any other. The character chosen as the "src" character is significant only because it is used in normalized matches in place of any "dest" character.

Consider the following example configuration:

{
   "expansions": [
      { "src": "a", "dest": ["b", "c"] }
   ]
}

If your grammar contains only the following pattern:

<pattern>ade</pattern>

Eduction expands the pattern to:

<pattern>[abc]de</pattern>

So if your input contains the following text:

ade bde cde dde

Eduction matches ade, bde, and cde, and produces the normalized matches ade, ade, ade.

If your grammar contains only the following pattern:

<pattern>bde</pattern>

Eduction expands the pattern to:

<pattern>[abc]de</pattern>

...which produces the same matches (ade, bde, and cde) and the same normalized matches (ade, ade, ade) as before.

You could use character expansions if you have written a grammar file where the patterns contain space characters, but you also want to match non-breaking spaces or other Unicode space characters.

Optimize Case-Insensitive Matching

When you have a grammar file that contains case-sensitive entities, but you want to find matches regardless of case, you can run Eduction with the parameter MatchCase=False. When Eduction loads a grammar file and MatchCase=False, it optimizes the entities for case-insensitive matching to improve run-time performance. However, this can increase the time required to initialize Eduction, so if you regularly use a grammar file with MatchCase=False you can optimize the entities for case-insensitive matching at compile-time instead.

To optimize entities for case-insensitive matching, set the option alternativeCaseArcs to true in your compilation configuration file:

{ "alternativeCaseArcs": true }

After compiling a grammar file with this option, you can still use the MatchCase parameter to choose whether matching is case-sensitive.

To obtain the best case-insensitive performance, you should write a grammar file using only upper or lower case and then normalize the input by setting the CaseNormalization parameter. For more information, see Case Normalization. Compiling a grammar file with alternativeCaseArcs set to true is useful if you cannot easily modify your grammar file(s), but only reduces the time required to initialize Eduction (it does not reduce the time required for matching).

To recompile an existing grammar file with alternativeCaseArcs set to true you could include the existing grammar in a new grammar file as shown in the example below, and then compile the new grammar using edktool.

<?xml version="1.0" encoding="UTF-8"?>
<grammars version="4.0">
  <include path="published/grammar.ecr" type="public"/>
</grammars>

Use the Configuration File

You add a configuration file to your compilation by setting the -c command-line option in the compile command. For more information, see Compile.

When you compile a grammar by using the Eduction SDK, you can specify the path to a compilation configuration file by using one of the following options:

  • C API: the EdkLoadResourceFileWithCompileConfig and EdkLoadResourceBufferWithCompileConfig functions.

  • Java API: the loadResourceFile, loadResourceFiles, and loadResourceBuffer methods in the TextExtractionEngine interface.

  • .NET API: the GetCompiler method on the EDKFactory class.

For more information, refer to the API documentation.