Enable Generic Transliteration
The default IDOL Content component configuration file uses generic transliteration. Micro Focus recommends that you use generic transliteration because it is the best way to ensure that cross-lingual search can happen.
Generic transliteration performs transliteration as described in the following table.
Language or character type | Transliteration |
---|---|
Symbols | All dashes and hyphens to a hyphen character. |
Latin | Accented characters to non-accented characters |
Spanish | Accented vowels áéíóúü to non-accented vowels |
Portuguese | Accented vowels àáâãçéêíòóôõúü to non-accented vowels |
Greek | Accented Greek characters to non-accented characters |
Cyrillic (including Serbian extensions) | All characters mapped to A–Z |
Arabic | Arabic character normalization |
Japanese |
Half width katakana to full width katakana Full width 0–9, A–Z, a–z to single byte 0–9, A–Z, a–z |
Chinese | Full width 0–9, A–Z, a–z to single byte 0–9, A–Z, a–z |
For all other languages, transliteration does not apply, except for hyphen normalization.
NOTE: Languages with a sentence-breaking library might be transliterated as part of the sentence-breaking process.
When you set GenericTransliteration
to True
, it applies to all languages, unless you specifically disable transliteration for a language.
You can disable transliteration for an individual language by setting the Transliteration
parameter to False
in the individual language configuration section. This option completely disables transliteration for that language.