Configure Languages

The IDOL Content component can process documents in multiple languages and encodings. For each language that you want to use, you must define the language types in the IDOL Content component configuration file. You must also configure the IDOL Content component to classify documents, either by automatically detecting the language and encoding, or by reading the language type from a field.

Define Language Types

To run the IDOL Content component in multiple languages, specify the language types you want the IDOL Content component to process. A language type is a combination of the language and encoding.

NOTE: You must specify languages and language types before you index data into the IDOL Content component.

You can now configure the IDOL Content component to associate the language types you defined with documents.

Associate Language Types with Documents

After you define all the language types you want the IDOL Content component to process, set up a field process that allows the IDOL Content component to associate these language types with documents.

The way you configure the field process depends on the documents that you want to index:

  • If all the documents contains a field that exactly specifies the language type, configure a field process to define this field as a LanguageType field. The language types that appear in this field must exactly match the language types that you define in the [LanguageTypes] configuration section. See Configure Fields.

  • If the documents contain a field that specifies the language, but does not exactly specify the language type, you can configure field processes to detect the language from this field data.