Standard Grammar – Source
Eduction includes standard grammar files in source form (XML) and their compiled equivalents (ECR). The source files import compiled Eduction standard grammar files and illustrate sample usage. You can modify these XML source files and recompile them to customize a grammar for the needs of your Eduction application.
The following table lists public entities defined in the XML source files. It excludes the public entities that are republished from the imported Eduction ECR grammar files.
File | Entity | Description |
---|---|---|
measure.xml | measure/all/eng | An editable collection of patterns that match length, area, volume, and mass. |
money.xml1 | money/all |
All currency amounts. NOTE: This grammar file supports some English alphabetic numbers, for example, seven cents, $12 million, one hundred dollars, £5m. |
pci_dss.xml |
pci_dss/person_name/engus pci_dss/date/engus pci_dss/credit_card/engus pci_dss/bank_names/engus |
Person names. Dates. Credit and debit card numbers. Bank names. |
pii.xml |
pii/person_name/engus pii/phone_number/engus pii/email_address/engus pii/ip_address/engus pii/social_security/engus pii/car_numberplate/engus pii/driver_license/engus pii/credit_card/engus pii/date/engus pii/country pii/state/engus pii/county/engus pii/city/engus pii/address/engus pii/zipcode/engus pii/age/engus pii/gender/engus pii/race/engus pii/job_title/engus pii/disease_and_condition/engus pii/account_number/engus pii/license_number/engus pii/facebook_url/engus |
Personal names. Phone numbers. Email addresses. IP addresses. Social Security numbers. Car license plate numbers. Driver’s license numbers. Credit and debit card numbers. Dates. Countries. U.S. states or possessions. U.S. counties. U.S. cities. Geographical addresses. U.S. zipcodes. Age. Gender. Race. Job title. Disease or medical condition. Generic account number with 6-8 digits in a predictable context. Generic license number with specific alphanumeric format. Example URL for a personal Web page (Facebook). |
place_europe.xml | place/country/europe | European country in English (and some local languages). |
place/country_uppercase/europe | European country in English and local languages (uppercase). | |
place/city1/europe | European settlement with over 100,000 inhabitants, in local language. | |
place/city1_uppercase/europe | European settlement with over 100,000 inhabitants, in local language (uppercase). | |
place/city2/europe |
European settlement with between 10,000 and 100,000 inhabitants, in local language. | |
place/city2_uppercase/europe |
European settlement with between 10,000 and 100,000 inhabitants, in local language (uppercase). | |
place/region/Europe | High-level administrative division, in local language. | |
place/region_uppercase/Europe | High-level administrative division, in local language (uppercase). | |
place_south_america.xml | place/country/south_america | South American country in English, Spanish, or Portuguese. |
place/country_uppercase/south_america | South American country in English, Spanish, or Portuguese (uppercase). | |
place/city1/south_america | South American settlement with over 100,000 inhabitants, in local language. | |
place/city1_uppercase/south_america | South American settlement with over 100,000 inhabitants, in local language (uppercase). | |
place/city2/south_america | South American settlement with between 10,000 and 100,000 inhabitants, in local language. | |
place/city2_uppercase/south_america | South American settlement with between 10,000 and 100,000 inhabitants, in local language (uppercase). | |
place/island/south_america | South American island, in local language. | |
place/island_uppercase/south_america | South American island, in local language (uppercase). |
|
place/region/south_america | High-level administrative division, in local language. | |
place/region_uppercase/south_america | High-level administrative division, in local language (uppercase). | |
retention.xml |
retention/admission_date retention/discharge_date retention/birth_date retention/age/eng |
Admission date. Discharge date. Birth date. Age. |
sample.xml | sample/solar_system | A simple entity for planets of the solar system. |
sentiment_user_chi.xml |
sentiment/user_client_name sentiment/user_client_brand sentiment/user_client_rv1_name sentiment/user_client_rv1_brand sentiment/user_third_party_company_name sentiment/user_third_party_company_brand sentiment/user_positive_adjective sentiment/user_negative_adjective sentiment/user_positive_noun sentiment/user_negative_noun sentiment/user_neutral_noun sentiment/user_positive_verb sentiment/user_negative_verb sentiment/user_neutral_verb sentiment/user_positive_idiom sentiment/user_negative_idiom |
You can use these files to modify the sentiment analysis grammar files for the relevant languages to give access to extra domain-specific vocabulary. |
sentiment_user_ara.xml sentiment_user_cze.xml sentiment_user_dutch.xml sentiment_user_eng.xml sentiment_user_fre.xml sentiment_user_ger.xml sentiment_user_ita.xml sentiment_user_pol.xml sentiment_user_por.xml sentiment_user_rus.xml sentiment_user_spa.xml sentiment_user_tur.xml |
sentiment/user_positive_adjective sentiment/user_negative_adjective sentiment/user_neutral_adjective sentiment/user_positive_adverb sentiment/user_negative_adverb sentiment/user_neutral_adverb sentiment/user_positive_noun sentiment/user_negative_noun sentiment/user_neutral_noun sentiment/user_positive_verb sentiment/user_negative_verb sentiment/user_neutral_verb sentiment/user_positive_match sentiment/user_negative_match sentiment/user_good_noun (English only) |
The entities in this table incorporate the compiled Eduction entities in combination with Eduction XML grammar to create additional entities. The XML illustrates how to use the compiled Eduction entities. You can modify these XML files and compile them into Eduction ECR files that you can then use for specific applications.
The Eduction grammar files have three advantages:
- Allows for fined-grained access to basic entities that include more complex entities. You can then customize the complex entities to increase the precision and recall of the matching process.
- Provides both the compiled ECR grammar files as well as source-form XML grammar files that reference them.
- Separate ECR files reduce the memory footprint and file size.