If your IDOL Server license includes automatic language detection, IDOL Server can automatically identify the language and encoding of a document when it is indexed. IDOL Server analyzes a certain amount of text in the document content fields (fields for which SourceType
is set to True
in the IDOL Server configuration file).
Open the IDOL Server configuration file in a text editor.
Find the [Server]
section and add this setting:
AutoDetectLanguagesAtIndex=True
Set DiscardUnconfiguredLanguagesAtIndex
to True
if you do not want to index documents with a language type that is not configured.
Set DiscardUnknownLanguagesAtIndex
to True
if you do not want to index documents whose language IDOL cannot recognize. For example, it might not recognize the language because the document does not contain language, or it might not have enough text for IDOL Server to determine the language.
By default IDOL Server indexes the document using the default language type. It also logs a warning message in the index log, so that you can add an appropriate language type.
You can change the amount of text that IDOL Server analyzes to detect the language of a document. By default, IDOL Server uses only a few sentences. In some situations, increasing the amount of text to analyze can give more accurate results, such as when significant amounts of a minor second language are present.
Add the MaxLanguageDetectTerms
setting to the [Server]
section, specifying the number of terms (words) that IDOL Server uses for detection. For example:
MaxLanguageDetectTerms=1000
By default, IDOL Server detects any 7-bit ASCII characters as UTF-8. If you instead want to group these documents with documents using 8-bit ASCII, disable the LangDetectUTF8
parameter by setting it to False
.
Ensure that the encoding option you want is present in the language type configuration (see Define Language Types). If there are no compatible encodings configured for the detected language, IDOL assigns the default language type.
Save and close the configuration file. Restart IDOL Server for your changes to take effect.
Note: If you enable automatic language detection and set up a field process that reads the language of a document from one of its fields, IDOL Server uses the field process rather than autodetection to determine the document language and encoding.
|