Extend Grammars
The Eduction standard grammars provide good coverage for common pieces of information that you would normally want to extract from your data. They are designed so that you can easily reference them in any custom grammars that you create.
For some data, the coverage provided might not be sufficient. In this case, you can extend the entities provided with new entries to improve the recall of the extraction (the percentage of matches that are actually returned, out of the total number of true matches). For more information about recall, see Results Relevance.
You cannot edit the standard grammars in place because they are provided in .ECR format. You can, however, add more entries to an existing entity in an .ECR grammar file by extending it in a custom grammar file in XML format.
When to Extend a Grammar
You might consider extending a grammar if the recall of the existing grammar is low. In this case, you can work out what items the existing grammar does not match, and add these as new entries in the appropriate entities in your custom grammar.
You then compile the custom grammar (using edktool) before you use it, to allow Eduction to load it quicker. You can then replace the original grammar file with the new grammar file.
TIP: If you need to detect entities that are not supported by the Eduction grammars available from Micro Focus, raise the issue with your support contact. The entity you want to detect might be supported in an upcoming release. Alternatively, Micro Focus might be able to add support in future, if other customers want it too.
Using an official grammar means that you do not have to maintain it.
Create a Reference to an Existing Entity
You can build custom grammars from scratch. However, the standard grammars provide many basic entities that you can reference in your grammars, which allows you to create new custom grammars quickly.
If you reference other entities in an entity that you create, you can use one of the following reference extensions:
-
(
?A^Entity
) During compilation, create a link to the referenced entity from your entity. -
(
?A:>Entity
) During compilation, copy the compiled version of the referenced entity to your entity.
For the first option, compilation is quicker, and the resulting grammar file is a lot smaller. The second option can provide a small performance gain during extraction. Micro Focus recommends that you use the first option in most cases, unless the extraction performance is critical (see Reference or Copy Entities).
For more information about these options, see Regular Expressions. For a tutorial that describes in more detail how to create a new grammar to extend existing entities, refer to IDOL Expert.
Add More Entries to an Entity
To add more entries to an entity, create a new XML grammar file. In the new grammar file, include the .ECR file that contains the entity that you want to extend.
Ensure that your grammar file defines the same grammar and entity as the included grammar file. The full entity name, including the grammar prefix, must match for the grammar extension to work. Set the extend mode of the entity in your new grammar to Append
, and add the extra entries in the entity.
Replace the Current Entities
Although most of the time you would add new entries when you extend a grammar, you can sometimes choose to replace it entirely. To do this, set the extend mode of the entity in your new grammar file to Replace
.
Extend the Sentiment Grammars
Grammar extension is particularly useful when you use Eduction for sentiment analysis.
There are two main reasons why you might extend the sentiment grammar file.
-
You want Eduction to find some of the matches it misses because the compiled grammar does not include some of the positive or negative adjectives and adverbs in your data. To do this, you simply extend the appropriate entities with the new entries.
-
You want to change the sentiment for some objects. This option is currently available only for the English sentiment grammar.
For example, the phrase Company A is much better than Company B might be positive or negative depending on whether you are with Company A or Company B. If you are with Company A, you can make Eduction return a match from the sentence with a positive sentiment by adding Company A to an entity that lists entries that you consider good.