Ambiguous Entities

For some entities, IDOL PII Package cannot always unambiguously determine the country of origin for a value. For some of these cases, it can return an ambiguous result.

To use ambiguous entities, you must set the ambiguous_entities parameter to true in the pii_postprocessing.lua script. By default, the entities return only one country, which is more efficient.

Cross-Language Passport Landmarks

The IDOL PII Package allows cross-language passport landmarks, so that it detects passport numbers provided in languages that do not belong to the associated passport country.

For example, the text "Oma passi on P 4366918" contains passi, which is Finnish for passport, and the number P 4366918, which is an Austrian passport number. The PII grammar identifies this as an Austrian passport number and returns the entity pii/context/passport/at.

In some cases, the country of origin is ambiguous. In this case, the grammar attempts to identify all possible countries and returns an entity with the label ambiguous.

Example 1

"Mon passeport est LA080402"

In this example, both the landmark text passeport, and the passport number could be from either Belgium, Canada, or Luxembourg. This example returns the entity pii/passport/context/ambiguous/be_ca_lu to represent all three possibilities.

Example 2

"Vegabref mitt er AA5275702"

In this example, Vegabref is Icelandic, but AA5275702 could be a passport number for several countries, not including Iceland. This example returns the entity pii/passport/context/ambiguous/au_ee_fi_gr_hu_ie_it_jp_lv_nl_pl_si_sk_us to represent all the possibilities.

TIP: In this example, if the text belonged to a language from one of the possible countries, the passport number would not be considered ambiguous. For example, "Oma passi on AA5275702" (where passi is Finnish for passport), returns the entity pii/passport/fi, because passi applies only to Finnish, and not to any of the other countries where the passport number is valid.

Ambiguous Driving License Matches

There are some countries that have some overlap in driving license number, and where the languages are the same it is not possible to identify which country a particular number comes from. In this case, the grammar attempts to identify all possible countries and returns an entity with the label ambiguous. For example:

pii/driving/context/ambiguous/au_ie_us
pii/driving/context/ambiguous/au_nz_us
pii/driving/context/ambiguous/au_us
pii/driving/context/ambiguous/mt_us