PHI Grammar Customization

In cases where you find that the PHI grammars miss particular matches in your input, you can customize them. This section describes the possible customizations.

The following grammars support customization:

  • address.ecr

  • name.ecr

NOTE: It is technically possible to extend any public entity in a PHI grammar, but it can involve a lot of work. If you want to extend an entity that is not listed in the following list, see Modify Other Grammars and Entities.

For each grammar that supports customization, you can customize the following entities:

  • address

    • phi/address/knowncity_headwords/us

    • phi/address/knownstreet/us

  • name

    • phi/name/surname/nocontext/us

    • phi/name/given_name/nocontext/us

  • name_cjkvt

    • phi/name/surname/nocontext/latin/us

    • phi/name/surname/nocontext/cjkvt/us

    • phi/name/surname/nocontext/cjkvt_spaced/us

    • phi/name/given_name/nocontext/latin/us

    • phi/name/given_name/nocontext/cjkvt/us

    • phi/name/given_name/nocontext/cjkvt_spaced/us

You can use customizations to add entries that the existing entities do not match (such as unusual names). You might also use it if your data uses unusual separators and punctuation. The following sections provide examples of these changes.

TIP: When you customize an entity, you can either replace or extend the definition. For PHI grammars, Micro Focus recommends that you only extend the entity definitions.

If you replace an entity, you are likely to miss matches or reduce performance. In addition, existing definitions cover many match cases that you might not consider, so there is a lot of value in using these definitions as a base.

Example 1: New Street Address

The following grammar definition below shows an example for extending address.ecr.

address_extended.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE grammars SYSTEM "../published/edk.dtd">
<grammars version="4.0">
   <include path="address.ecr"/>
   <grammar name="phi/address">

      <entity name="suffixes/us" type="private">
         <entry headword="Cury"/>
         <entry headword="CURY"/>
      </entity>
  
      <entity name="knownstreet/us" extend="append" type="private">
         <pattern>[A-Z][a-z]+ (?A:suffixes/us)</pattern>
      </entity>

      <entity name="streetlocation/nocontext/us" extend="append">
         <pattern score="0.75">(?A=STREET:(?A:knownstreet/us))</pattern>
      </entity>

   </grammar>
</grammars>

This definition extends the knownstreet/us and streetlocation/nocontext/us entities in the PHI address grammar:

  • It adds Cury as a known street suffix.
  • It extends the knownstreet entity to accept any two word street name that ends with the new Cury suffix.
  • It extends the streetlocation/nocontext/us entity to use the extended knownstreet entity, so that these changes take effect.

The result of these changes is that Petty Cury matches as a street location with a score of 0.75. Previously, it would not have matched at all.

TIP: You do not need to redeclare the full address entity to use the extended knownstreet entity.

For example, with these changes 123 Petty Cury, Cambridge MA 02140 now matches phi/address/us with a score of 1. Previously, this address would have matched, but with a lower score.

When you add known street names or patterns for your country of interest, it improves scores for matches that contain these customizations.

Example 2: New Known City

The following grammar definition adds more known cities to address.ecr.

address_extended.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE grammars SYSTEM "../published/edk.dtd">
<grammars version="4.0">
   <include path="address.ecr"/>
   <grammar name="phi/address">
      <entity name="knowncity_headwords/us" extend="append" type="private">
         <entry headword="Chesterton"/>
      </entity>

      <entity name="city/nocontext/us" extend="append">
         <pattern>(?A=CITY:(?A^knowncity_headwords/us))</pattern>
      </entity>

   </grammar>
</grammars>

This example definition:

  • adds Chesterton to the knowncity_headwords/us entity.
  • extends the city/nocontext/us entity to use the extended knowncity entity, so that the change takes effect.

The result of these changes is that Chesterton matches as a city with a score of 1. Previously, it would have matched as a speculative city name, with a lower score.

Again, you do not need to change the full address entity to pick up this new declaration. For example, 123 Main Street, Chesterton MA 02140 now matches phi/address/us with a score of 1, which is an improved score. Previously, it would have matched with a lower score, because the city was a speculative match.

TIP: The definition for city/nocontext/us uses the dynamic reference syntax when using the knowncity_headwords/us; that is, (?A^. Micro Focus recommends this syntax for performance reasons when you refer to that entity, because the version of this entity for each country often contains several thousand entries.

To make both sets of changes for known streets and cities, merge the declarations in examples 1 and 2 into a single XML file.

Example 3: New Name and Custom Separator

Another way to use entity customizations is to declare patterns with custom separators. For example, if your input data contains unusual spacing or characters between entities, you can declare these in your entity extensions.

The following grammar definition extends name.ecr.

name_extended.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE grammars SYSTEM "../published/edk.dtd">
<grammars version="4.0">
   <include path="name.ecr"/>
   <grammar name="phi/name">

      <entity name="given_name/nocontext/us" extend="append" case="insensitive">
         <entry headword="Fobo" score="2"/>
      </entity>

      <entity name="surname/nocontext/us" extend="append" case="insensitive">
         <entry headword="Jobo" score="2"/>
      </entity>

      <entity name="gb" extend="append">
         <pattern>(?A=SURNAME:(?A:surname/nocontext/us))@@(?A=FORENAME:(?A:given_name/nocontext/us))</pattern>
      </entity>

   </grammar>
</grammars>

This declaration makes two changes:

  • It adds new entries for given_name and surname. This change allows Fobo Jobo to match as a name.

  • It declares a new pattern for the us entity, to match a name in reverse order, with the elements separated by a custom separator (two @ symbols). This change allows Jobo@@Fobo to match as a name.

TIP: The grammar already handles hyphenated known names. For example, after this definition change, Eduction matches Fobo-Fobo Jobo with a score of 1, with no further changes required. You do not need to add hyphenated entries to the given_name/nocontext or surname/nocontext entities.

Compile Custom Grammars

As with any Eduction grammar, Micro Focus recommends that you compile your grammar extensions before using them. You can use the edktool command-line tool to compile the XML file that contains your extension declarations into an ECR file.

For more information about compiling custom grammars, refer to the Eduction User and Programming Guide.

Modify Other Grammars and Entities

It is possible to extend any public entity in a PHI grammar. However, you cannot use the various private entities that the public ones use in their definitions.

For entities in the simpler grammars such as driving or national ID, this might be less of a problem, as long as you know the format for the data portion of this entity. For example, you might want to add new landmarks to these entities, for example.

However, be aware that existing definitions account for factors such as varying spaces, and additional words between the landmark and the data. In this case, you must emulate this behavior in your extensions, which might take a lot of work.

In practice, Micro Focus recommends that you make a support request to make these changes to the official PHI grammars, unless you need to add support in a very short time frame. The existing definitions provide a lot of value because they cover so many match cases, and you might miss these cases when you extend the public entities where these definitions are not available.