You can override the default stemming rules for certain words in a particular language by creating a language-specific stemming file. This file is a list of words and their stems. If a stemming file exists, the IDOL Content component uses it to stem the terms that it contains. Terms that are not in the file stem according to the default stemming rules.
Micro Focus recommends use of a stemming file only for unusual or specialized terms where the default rules do not generate a stem. A stemming file is not intended to be a complete replacement for the IDOL stemming algorithms.
Create a text file.
Format the file as a stop word list. The first line is an encoding designation. Subsequent lines contain individual word pairs; a term followed by its stem. For example:
[UTF8] mice mouse mouse mouse children child
The terms and stems can contain only alphanumeric characters.
NOTE: To ensure that two words stem to the same value, you must add both words to the stemming file, with the appropriate stem.
Save the file with a name of your choice (for example, english_stem.dat
) in the directory installDir/common/langfiles
.
Open the IDOL Content component configuration file in a text editor.
In the [MyLanguage]
section for the stemming file language, set the StemmingFile
configuration parameter to the name of your stemming file. For example:
[english] Encodings=UTF8:englishUTF8 Stoplist=engish.dat StemmingFile=english_stem.dat
Ensure that this [MyLanguage]
section does not set Stemming
to False
. The default value for Stemming
in a language is True
.
If you disable stemming for a language, but provide a stemming file, Content stems terms in the file, but does not stem other terms.