Extend Grammars

The standard grammars provided by Eduction provide good coverage for common items of information that you would normally want to extract from your data. They are designed so that you can easily reference them in any custom grammars that you create.

For some data, the coverage provided might not be sufficient. In this case, you can extend the entities provided with new entries to improve the recall of the extraction (the percentage of matches that are actually returned, out of the total number of matches that should return in theory).

You cannot edit the standard grammars in place because they are provided in .ECR format. You can, however, add more entries to an existing entity in an .ECR grammar file by extending it in a custom grammar file in XML format.

For more general information about how to extend grammars, refer to the Eduction User and Programming Guide.

For a detailed tutorial that describes how to create and extend a grammar, see Eduction Grammar Tutorial.

When to Extend a Grammar

You should consider extending a grammar if the recall of the existing grammar is low. Work out what items are not being matched by the existing grammar, and add these as new entries in the appropriate entities in your custom grammar.

You can compile the custom grammar (using edktool) before you use it, to allow Eduction to load it quicker. You can then replace the original grammar file with the new grammar file.

Extend the Sentiment Grammars

Grammar extension is particularly useful when you use Eduction for sentiment analysis.

There are two main reasons why you might extend the sentiment grammar file.

  • You want Eduction to find some of the matches it misses because some of the positive or negative adjectives and adverbs in your data are not included in the compiled grammar. To do this, you simply extend the appropriate entities with the new entries.

  • You want to change the sentiment for some objects. This option is currently available only for the English sentiment grammar.

    For example, the phrase Company A is much better than Company B might be positive or negative depending on whether you are with Company A or Company B. If you are with Company A, you can make Eduction return a match from the sentence with a positive sentiment by adding Company A to an entity that lists entries that you consider good.