Configure Languages

The Content component can process documents in multiple languages and encodings. For each language that you want to use, you must define the language types in the Content component configuration file. You must also configure the Content component to classify documents, either by automatically detecting the language and encoding, or by reading the language type from a field.

Define Language Types

To run the Content component in multiple languages, specify the language types you want the Content component to process. A language type is a combination of the language and encoding.

NOTE: You must specify languages and language types before you index data into the Content component.

You can now configure the Content component to associate the language types you defined with documents.

Associate Language Types with Documents

After you define all the language types you want the Content component to process, set up a field process that allows the Content component to associate these language types with documents.

The way you configure the field process depends on the documents that you want to index:

  • If all the documents contains a field that exactly specifies the language type, configure a field process to define this field as a LanguageType field. The language types that appear in this field must exactly match the language types that you define in the [LanguageTypes] configuration section. See Configure Fields.

  • If the documents contain a field that specifies the language, but does not exactly specify the language type, you can configure field processes to detect the language from this field data.