Combined Entities

In addition to the entities described in the Eduction Grammar Reference, the IDOL PCI Package includes grammar files that contain "combined" entities. These files are named combined_*.ecr (or combined_*_cjkvt.ecr for Japan) and the entities match names from multiple countries.

  • The entities that end in /all match data for any supported non-CJKVT country or language.
  • The entities that end in /all_cjkvt match data for any supported CJKVT country.

For example:

  • Using pci/names/all from combined_names.ecr matches a name from any non-CJKVT country. This is similar to using the name.ecr grammar file and extracting pci/name/??.

The combined (/all and /all_cjkvt) entities provide a significant improvement in processing speed when you extract matches for all countries or languages.

The combined grammar files might produce fewer matches, because (by default) only a single match is returned in cases where the same characters in the input text would match multiple countries or languages.

TIP: If you need all matches, you can turn on the AllowMultipleResults configuration option. This option slows down the matching process because it does not stop after a single match, but is generally still faster than using the individual grammars.

File Entity
combined_name.ecr pci/name/all
combined_name_cjkvt.ecr pci/name/all_cjkvt
pci/name/latin/all_cjkvt
pci/name/cjkvt/all_cjkvt