Language Configuration
The Answer Server functionality uses language-dependent information to parse and classify questions and match them to answers.
Answer Bank Language Configuration
In the answer bank systems, the language-dependent processing is managed by the answer bank Agentstore component. The Agentstore component stores the questions, answers, and processes the Ask actions as queries.
You can configure languages in the Agentstore in the same way as in the IDOL Content component. For more information, refer to the IDOL Server Reference and the IDOL Server Administration Guide.
Fact Bank and Passage Extractor Language Configuration
In fact bank and passage extractor systems, Answer Server can use stemming and stop lists to improve the question parsing and answer matching.
-
Stemming is the process of reducing related words, such as plurals and verb forms, to a common linguistic root. For example, in English, helping, helped, and helps all derive from the common root help.
Stemming rules are language-dependent. To get the best possible results, you must specify the language that you use in your questions.
-
A stop list is a list of very common words, which usually add very little meaning to phrases. For example, in English, the and and can often be ignored without losing the sense of a sentence. IDOL uses stop lists to optimize matching.
In a fact bank system, Answer Server uses the stop list to match fact bank codes more broadly. Answer Server attempts to form pseudonym values in the code maps by taking the existing code phrases and removing stop words from the beginning and end of the phrase. Similarly, when Answer Server attempts to match a string to codes, it matches the full phrase, and the phrase with the stop words removed from the beginning and end of the phrase. Answer Server does not attempt to remove stop words from the middle of phrases.
In a passage extractor system, Answer Server removes stop words from the classified questions to attempt to find the best match in the data IDOL Content component. Answer Server does not use the stop list for question classification, which often depends on common words, such as question words.
By default, Answer Server uses English stemming rules, and does not use a stop list. If you use the default fact bank and passage extractor grammar files, this is usually appropriate, although you might want to add an English stop list, by setting the StopList configuration parameter.
To use fact bank and passage extractor with different languages, you need a version of the grammar files in the appropriate language. These grammar files are available in English, French, German, Italian, Portuguese, and Spanish. If you are interested in using fact bank and passage extractor in other languages, contact your OpenText account manager.
If you are using fact bank and passage extractor in a language other than English, you can change the stemming language by modifying the Language configuration parameter. You can also add a stop list by setting the StopList configuration parameter.
You can configure the location where you store your language files (such as stop lists), by using the LanguageDirectory configuration parameter.
You can define language configuration in the [Server]
section, to apply to all your systems. You can also set the language configuration parameters in your individual system configuration sections. If you set the parameters in both sections, the system configuration takes precedence.
For more information about these parameters, refer to the Answer Server Reference.