The IDOL Content component can process documents in multiple languages and encodings. For each language that you want to use, you must define the language types in the IDOL Content component configuration file. You must also configure the IDOL Content component to classify documents, either by automatically detecting the language and encoding, or by reading the language type from a field.
To run the IDOL Content component in multiple languages, specify the language types you want the IDOL Content component to process. A language type is a combination of the language and encoding.
You must specify languages and language types before you index data into the IDOL Content component.
Open the IDOL Content component configuration file in a text editor.
Find the [LanguageTypes]
section. List the languages that you want the IDOL Content component to process. You must use ASCII characters to specify the language names.
For example:
[LanguageTypes] 0=English 1=Afrikaans 2=General
For each language, create a configuration section that matches the name you defined in the [LanguageTypes]
section.
In this section, specify appropriate settings that determine how the IDOL Content component handles this language. For details on the configuration parameters you can use, refer to the IDOL Server Reference.
For each section, set the Encodings
parameter to a list of the encodings and corresponding language types used by the language. List each encoding and language in the format encoding:languagetype
. Separate multiple language types with commas.
For example:
[english] Encodings=ASCII:englishASCII,UTF8:englishUTF8 Stoplist=english.dat IndexNumbers=1 [afrikaans] Encodings=ASCII:afrikaansASCII,UTF8:afrikaansUTF8 IndexNumbers=1 [general] Encodings=UTF8:generalUTF8,ASCII:generalASCII,CYRILLIC:generalC YRILLIC IndexNumbers=1
Save and close the configuration file.
Restart IDOL server for your changes to take effect.
You can now configure the IDOL Content component to associate the language types you defined with documents.
After you define all the language types you want the IDOL Content component to process, set up a field process that allows the IDOL Content component to associate these language types with documents.
The way you configure the field process depends on the documents that you want to index:
If all the documents contains a field that exactly specifies the language type, configure a field process to define this field as a LanguageType
field. The language types that appear in this field must exactly match the language types that you define in the [LanguageTypes]
configuration section. See Configure Fields.
If the documents contain a field that specifies the language, but does not exactly specify the language type, you can configure field processes to detect the language from this field data.
Open the IDOL Content component configuration file in a text editor.
In the [FieldProcessing]
section, define a field process for each language that you want to detect.
For example:
[FieldProcessing] 0=DetectArabic 1=DetectEnglish 2=DetectFrench
Create a configuration section with the same name as each of the field processes you defined in the [FieldProcessing]
section.
In this section:
Set Property
to the name of the property for the specified language type.
Set PropertyFieldCSVs
to a comma-separated list of fields that can contain the language data.
Set PropertyMatch
to a comma-separated list of values that this field might contain to identify the specified language type.
For example:
[DetectArabic] Property=SetArabicProperty PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG PropertyMatch=arabic [DetectEnglish] Property=SetEnglishProperty PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG PropertyMatch=*eng*,uk,*british [DetectFrench] Property=SetFrenchProperty PropertyFieldCSVs=*/DRELANGUAGETYPE,*/LANG PropertyMatch=*fre*,fran*
Create a configuration section for each property that you define in the field processing sections.
In the property configuration section, set the LanguageType
parameter to the language type to use to define documents that match this property (that is, that contain a field with a matching value for the field process). This language type must match one of the language types you configure in the [LanguageTypes]
configuration section.
For example:
[SetArabicProperty] LanguageType=Arabic [SetEnglishProperty] LanguageType=English [SetFrenchProperty] LanguageType=French
Save and close the configuration file.
Restart the IDOL Content component for your changes to take effect.
|