How Spelling Correction Works
To enable spell checking, set the parameters SpellCheckMaxCheckTerms
, SpellCheckIncorrectMaxDocOccs
, and UnstemmedMinDocOccs
in the [Server]
section of the configuration file before you index content. When you perform a query that includes Spellcheck=True
, the IDOL Content component uses these settings in the spell checking process, as shown below:
-
Content determines if the query is eligible for spell checking.
Content checks how many terms the query text contains (it ignores stop words, proper-name terms and hyphenated terms). If the number does not exceed the specified
SpellCheckMaxCheckTerms
, the query is eligible for spell checking. -
Content determines which terms are misspelled.
Content checks how many times each query term occurs in its data index. If a term occurs fewer times than the specified
SpellCheckIncorrectMaxDocOccs
, Content assumes that the term is misspelled. -
Content finds correct spellings and suggests them.
Content uses a proprietary term-distancing algorithm to find terms in its data index that are closest to the misspelled terms. It then checks how many times these terms occur. If a term occurs at least the specified number of
UnstemmedMinDocOccs
times, it uses it as a spell check suggestion.Content returns the corrected terms as a comma-separated list in an
<autn:spelling>
field. It also returns a corrected version of the query text in an<autn:spellingquery>
field. -
When you shut down the IDOL Content component, it creates a spelling correction file.
The spelling correction file stores the corrections that you make. You can add further corrections to the file or amend existing corrections.