Standard Grammar – Source

Eduction includes standard grammar files in source form (XML) and their compiled equivalents (ECR). The source files import compiled Eduction standard grammar files and illustrate sample usage. You can modify these XML source files and recompile them to customize a grammar for the needs of your Eduction application.

The following table lists public entities defined in the XML source files. It excludes the public entities that are republished from the imported Eduction ECR grammar files.

File Entity Description
measure.xml measure/all/eng An editable collection of patterns that match length, area, volume, and mass.
money.xml1 money/all

All currency amounts.

NOTE: This grammar file supports some English alphabetic numbers, for example, seven cents, $12 million, one hundred dollars, £5m.

pci_dss.xml

pci_dss/person_name/engus

pci_dss/date/engus

pci_dss/credit_card/engus

pci_dss/bank_names/engus

Person names.

Dates.

Credit and debit card numbers.

Bank names.

pii.xml

pii/person_name/engus

pii/phone_number/engus

pii/email_address/engus

pii/ip_address/engus

pii/social_security/engus

pii/car_numberplate/engus

pii/driver_license/engus

pii/credit_card/engus

pii/date/engus

pii/country

pii/state/engus

pii/county/engus

pii/city/engus

pii/address/engus

pii/zipcode/engus

pii/age/engus

pii/gender/engus

pii/race/engus

pii/job_title/engus

pii/disease_and_condition/engus

pii/account_number/engus

pii/license_number/engus

pii/facebook_url/engus

Personal names.

Phone numbers.

Email addresses.

IP addresses.

Social Security numbers.

Car license plate numbers.

Driver’s license numbers.

Credit and debit card numbers.

Dates.

Countries.

U.S. states or possessions.

U.S. counties.

U.S. cities.

Geographical addresses.

U.S. zipcodes.

Age.

Gender.

Race.

Job title.

Disease or medical condition.

Generic account number with 6-8 digits in a predictable context.

Generic license number with specific alphanumeric format.

Example URL for a personal Web page (Facebook).

place_europe.xml place/country/europe European country in English (and some local languages).
place/country_uppercase/europe European country in English and local languages (uppercase).
place/city1/europe European settlement with over 100,000 inhabitants, in local language.
place/city1_uppercase/europe European settlement with over 100,000 inhabitants, in local language (uppercase).
place/city2/europe
European settlement with between 10,000 and 100,000 inhabitants, in local language.
place/city2_uppercase/europe
European settlement with between 10,000 and 100,000 inhabitants, in local language (uppercase).
place/region/Europe High-level administrative division, in local language.
place/region_uppercase/Europe High-level administrative division, in local language (uppercase).
place_south_america.xml place/country/south_america South American country in English, Spanish, or Portuguese.
place/country_uppercase/south_america South American country in English, Spanish, or Portuguese (uppercase).
place/city1/south_america South American settlement with over 100,000 inhabitants, in local language.
place/city1_uppercase/south_america South American settlement with over 100,000 inhabitants, in local language (uppercase).
place/city2/south_america South American settlement with between 10,000 and 100,000 inhabitants, in local language.
place/city2_uppercase/south_america South American settlement with between 10,000 and 100,000 inhabitants, in local language (uppercase).
place/island/south_america South American island, in local language.
place/island_uppercase/south_america South American island, in local language (uppercase).
place/region/south_america High-level administrative division, in local language.
place/region_uppercase/south_america High-level administrative division, in local language (uppercase).

retention.xml

retention/admission_date

retention/discharge_date

retention/birth_date

retention/age/eng

Admission date.

Discharge date.

Birth date.

Age.

sample.xml sample/solar_system A simple entity for planets of the solar system.

sentiment_user_chi.xml

sentiment/user_client_name

sentiment/user_client_brand

sentiment/user_client_rv1_name

sentiment/user_client_rv1_brand

sentiment/user_third_party_company_name

sentiment/user_third_party_company_brand

sentiment/user_positive_adjective

sentiment/user_negative_adjective

sentiment/user_positive_noun

sentiment/user_negative_noun

sentiment/user_neutral_noun

sentiment/user_positive_verb

sentiment/user_negative_verb

sentiment/user_neutral_verb

sentiment/user_positive_idiom

sentiment/user_negative_idiom

You can use these files to modify the sentiment analysis grammar files for the relevant languages to give access to extra domain-specific vocabulary.

sentiment_user_ara.xml

sentiment_user_cze.xml

sentiment_user_dutch.xml

sentiment_user_eng.xml

sentiment_user_fre.xml

sentiment_user_ger.xml

sentiment_user_ita.xml

sentiment_user_pol.xml

sentiment_user_por.xml

sentiment_user_rus.xml

sentiment_user_spa.xml

sentiment_user_tur.xml

sentiment/user_positive_adjective

sentiment/user_negative_adjective

sentiment/user_neutral_adjective

sentiment/user_positive_adverb

sentiment/user_negative_adverb

sentiment/user_neutral_adverb

sentiment/user_positive_noun

sentiment/user_negative_noun

sentiment/user_neutral_noun

sentiment/user_positive_verb

sentiment/user_negative_verb

sentiment/user_neutral_verb

sentiment/user_positive_match

sentiment/user_negative_match

sentiment/user_good_noun (English only)

 

The entities in this table incorporate the compiled Eduction entities in combination with Eduction XML grammar to create additional entities. The XML illustrates how to use the compiled Eduction entities. You can modify these XML files and compile them into Eduction ECR files that you can then use for specific applications.

The Eduction grammar files have three advantages:

  • Allows for fined-grained access to basic entities that include more complex entities. You can then customize the complex entities to increase the precision and recall of the matching process.
  • Provides both the compiled ECR grammar files as well as source-form XML grammar files that reference them.
  • Separate ECR files reduce the memory footprint and file size.