Language Configuration

For each language that you use, create a [MyLanguage] section, using the name of the language listed below. In each section, configure the parameters that determine how to handle the language.

The individual language configuration parameters override any values that you set for these parameters in the [LanguageTypes] section. If you do not specify a parameter for the individual language, IDOL Category Component uses the value in the [LanguageTypes] section, or the internal default.

Note: If you have enabled automatic language detection, you can configure a General language to apply to any documents that are not in a specific language. IDOL Category Component assigns languages to the General language if it identifies the encoding but not the language. Add the appropriate encodings to the [General] configuration section.

If the document encoding is not configured, the document is placed into the DefaultLanguageType.

For example:

[english]
Encodings=ASCII:englishASCII,UTF8:englishUTF8
Stoplist=english.dat
IndexNumbers=1

[afrikaans]
Encodings=ASCII:afrikaansASCII,UTF8:afrikaansUTF8
IndexNumbers=1
			
[albanian]
Encodings=ASCII:albanianASCII,UTF8:albanianUTF8
IndexNumbers=1
			
[arabic]
Encodings=ARABIC_ISO:arabicARABIC_ISO,ARABIC:arabicARABIC,UTF8:arabicUTF8
IndexNumbers=1
			
[chinese]
Encodings=CHINESESIMPLIFIED:chineseCHINESESIMPLIFIED,CHINESETRADITIONAL:chineseCHINESETRADITIONAL,UTF8:chineseUTF8
SentenceBreaking=chinesebreaking
IndexNumbers=1
			
[general]
Encodings=UTF8:generalUTF8,ASCII:generalASCII,CYRILLIC:generalCYRILLIC
IndexNumbers=1
Acehnese Galician Luxembourgish Slovenian
Afrikaans Georgian Macedonian Somali
Albanian German Malagasy Sorbian
Amharic Gilaki Malay Spanish
Arabic Greek Malayalam Sranan
Armenian Greenlandic Maltese Sundanese
Azeri Guarani Manipuri Swahili
Basque Gujarati Maori Swedish
Belorussian Haitian Marathi Syriac
Bengali Hausa Mazandarani Tagalog
Berber Hawaiian Mirandese Tahitian
Bihari Hebrew Mongolian Tajik
Bikol Hindi Nahuatl Tamil
Bishnupriya Hungarian Navajo Tatar
Bosnian Icelandic Ndebele Telugu
Breton Igbo Nepali Thai
Bulgarian Ilokano Newari Tibetan
Burmese Indonesian Norwegian Tokpisin
Catalan Italian Oriya Tongan
Cebuano Japanese Ossetian Tsonga
Cherokee Javanese Panjabi Tswana
Chinese Traditional Kalmyk Papiamentu Turkish
Chinese Simplified Kannada Persian Turkmen
Chuvash Kapampangan Polish Ukrainian
Croatian Kazakh Portuguese Urdu
Czech Khmer Pushto Uyghur
Danish Kikongo Quechua Uzbek
Divehi Kinyarwanda Rhaeto-Romance Valencian
Dutch Kirundi Romanian Venda
English Komi Russian Vietnamese
Erzya Korean Sakha Waraywaray
Esperanto Kurdish Sami Welsh
Estonian Kyrgyz Sanskrit Wolof
Ethiopic Lao Serbian Xhosa
Faroese Lappish Sesotho Yiddish
Finnish Latin Sesotho sa Leboa Yoruba
French Latvian Singhalese Zulu
Frisian Lingala Siswant  
Gaelic Lithuanian Slovak  

 

AugmentSeparators

DecompositionFile

DiminishSeparators

Encodings

HyphenChars

IndexNumbers

NGram

NGramMultibyteOnly

NGramOrientalOnly

Normalise

NumberPunctuation

OCRLanguageFile

ProperNames

SentenceBreaking

SentenceBreakingOptions

SoftSeparators

Stemming

StemmingFile

Stoplist

TangibleCharacters

Transliteration


_HP_HTML5_bannerTitle.htm