CustomStemming
The name of a custom stemming library to use for stemming in this language.
The custom stemming library is an external shared library that performs stemming. You can create a custom stemming library when you want to modify the default IDOL stemming and using a StemmingFile is not sufficiently flexible.
Details of the interface to use to create a custom stemming library is provided on the Micro Focus IDOL public github site, https://github.com/microfocus-idol/idol-custom-stemming-example/.
When you configure a custom stemming library for a language, IDOL Server attempts to load it and uses it each time it needs to stem terms in that language.
The custom stemming interface allows the stemming library to indicate that it cannot stem a term, in which case IDOL falls back on the built-in stemming for that language, unless you also set Stemming to False
for the language. Setting Stemming to False
does not disable the custom stemming library. You can also configure both a custom stemming library and a StemmingFile. In this case, stems in the file take precedence.
It is possible for a custom stemming library to return multiple stemmed forms for a single term. This behavior is not possible with either built-in stemming algorithms or a StemmingFile. When there are multiple stems:
- Documents that contain a term with multiple stems match any of the stems in query text.
- Query terms with multiple stems match documents that contain any of the stems.
For example, the English word wound might stem to both WOUND
and WIND
.
- Documents that contain wound match queries for wounded or winds.
- Queries for wound match documents that contain either wounded or winds.
NOTE: This example assumes there are also normal English stemming rules to strip suffixes such as -ed or -s.
Multiple stems work as expected with Boolean, phrase and proximity search. However, some term actions are limited:
-
TermGetInfo returns only the primary stem for a term by default (the first stem that the library returns).
-
TermExpand with Expansion set to
stem
considers only the primary stems.
IDOL Server also applies only primary stems to hyphen chunks and proper names terms (when you enable HyphenChars or ProperNames, respectively for the language).
Type: | String |
Default: | |
Required: | No |
Configuration Section: | MyLanguage
|
Example: | [English]
In this example, IDOL Server attempts to load an external library called mycustomstemmer.dll (Windows) or libmycustomstemmer.so (non-Windows) from the configured LanguageDirectory, and uses it for stemming in English text. |
See Also: | Stemming
StemmingFile |
NOTE: If you change this setting after you have indexed content into IDOL Server, the new setting applies only to new content, and the server logs a warning. To clear the warning and ensure that your change applies to all your content, you must initialize your index and reindex the content.