Combined Entities
In addition to the entities described in the Eduction Grammar Reference, the IDOL PCI Package includes grammar files that contain "combined" entities. These files are named combined_*.ecr
(or combined_*_cjkvt.ecr
for Japan) and the entities match names from multiple countries.
- The entities that end in
/all
match data for any supported non-CJKVT country or language. - The entities that end in
/all_cjkvt
match data for any supported CJKVT country.
For example:
- Using
pci/names/all
fromcombined_names.ecr
matches a name from any non-CJKVT country. This is similar to using thename.ecr
grammar file and extractingpci/name/??
.
The combined (/all
and /all_cjkvt
) entities provide a significant improvement in processing speed when you extract matches for all countries or languages.
The combined grammar files might produce fewer matches, because (by default) only a single match is returned in cases where the same characters in the input text would match multiple countries or languages.
TIP: If you need all matches, you can turn on the AllowMultipleResults
configuration option. This option slows down the matching process because it does not stop after a single match, but is generally still faster than using the individual grammars.
File | Entity |
---|---|
combined_name.ecr | pci/name/all |
combined_name_cjkvt.ecr | pci/name/all_cjkvt |
pci/name/latin/all_cjkvt | |
pci/name/cjkvt/all_cjkvt |