CustomStemming

The name of a custom stemming library to use for stemming in this language.

The custom stemming library is an external shared library that performs stemming. You can create a custom stemming library when you want to modify the default IDOL stemming and using a StemmingFile is not sufficiently flexible.

Details of the interface to use to create a custom stemming library is provided on the Micro Focus IDOL public github site, https://github.com/microfocus-idol/idol-custom-stemming-example/.

When you configure a custom stemming library for a language, IDOL Server attempts to load it and uses it each time it needs to stem terms in that language.

The custom stemming interface allows the stemming library to indicate that it cannot stem a term, in which case IDOL falls back on the built-in stemming for that language, unless you also set Stemming to False for the language. Setting Stemming to False does not disable the custom stemming library. You can also configure both a custom stemming library and a StemmingFile. In this case, stems in the file take precedence.

It is possible for a custom stemming library to return multiple stemmed forms for a single term. This behavior is not possible with either built-in stemming algorithms or a StemmingFile. When there are multiple stems:

  • Documents that contain a term with multiple stems match any of the stems in query text.
  • Query terms with multiple stems match documents that contain any of the stems.

For example, the English word wound might stem to both WOUND and WIND.

  • Documents that contain wound match queries for wounded or winds.
  • Queries for wound match documents that contain either wounded or winds.

NOTE: This example assumes there are also normal English stemming rules to strip suffixes such as -ed or -s.

Multiple stems work as expected with Boolean, phrase and proximity search. However, some term actions are limited:

  • TermGetInfo returns only the primary stem for a term by default (the first stem that the library returns).

    TIP: You can return multiple terms by setting Boolean to True. In this case, Micro Focus recommends that you also set Type to None to allow you to easily identify different stems for the same term, which have the same value for the startpos attribute.

  • TermExpand with Expansion set to stem considers only the primary stems.

IDOL Server also applies only primary stems to hyphen chunks and proper names terms (when you enable HyphenChars or ProperNames, respectively for the language).

Type: String
Default:  
Required: No
Configuration Section: MyLanguage
Example: [English]
CustomStemming=mycustomstemmer

In this example, IDOL Server attempts to load an external library called mycustomstemmer.dll (Windows) or libmycustomstemmer.so (non-Windows) from the configured LanguageDirectory, and uses it for stemming in English text.
See Also: Stemming
StemmingFile

NOTE: If you change this setting after you have indexed content into IDOL Server, the new setting applies only to new content, and the server logs a warning. To clear the warning and ensure that your change applies to all your content, you must initialize your index and reindex the content.