Languages
When IDOL Server processes a document, it treats the text as a series of tokens (words), each of which is a unit of meaning. At a low level, this method is language independent. However, you can improve your query results by applying some language dependent processing.
Language dependent configuration allows you to:
-
make sure that all your content is treated consistently, allowing cross-lingual search.
-
filter your searches to content in a specific language.
This section describes the most important language concepts, and explains why you might use them.
-
Language Types. The language and encoding of a document.
-
Tokenization. The methods IDOL Server uses to split text into searchable tokens.
-
Stemming. Processing that reduces groups of related words to a common stem.
-
Stop Lists. Lists of words that do not convey meaning in documents.
-
Cross-Lingual Search. Search across documents in multiple languages.
-
Order of Language Processes. The order in which the language processing steps occur during indexing and querying.