Configure a Pre-Filter Task

For each pre-filter task that you want to configure, you set:

  • a regular expression that specifies how to find potential matches, or a resource file that provides a dictionary of terms to use for fast matching.

  • the amount of text Eduction must use on either side of the potential match to find the more detailed match.

NOTE: Eduction runs all your configured pre-filtering tasks for all input text, so ensure that your pre-filter task applies to all your configured grammars and entities. Use a different configuration for any entities that you do not want to pre-filter.

To configure a pre-filter task

  1. In the [Eduction] section, add a PreFilterTaskN parameter, where N is a number starting from 0 for the first task. Set this parameter to the name of a configuration section where you define your pre-filter task.

  2. Create the new configuration section.

  3. Set one of the following parameters:

    • Regex to a regular expression value that finds potential matches in your text.

    • ResourceFile to the name of a DPF or JSON file that contains the dictionary of terms to use for pre-matching.

  4. Set WindowCharsBeforeMatch and WindowCharsAfterMatch to the number of characters before and after the potential match segment to use as the match window.

  5. Optionally set other parameters to exclude non-valid values or end processing early in certain conditions, such as Exclusion, InvalidRegexAfterMatch, InvalidRegexBeforeMatch, and PrefilterMaxReturnedBytes. For more information, see Eduction Parameter Reference.

  6. Save and close your configuration file.

For example:

[Eduction]
PrefilterTask0=AddressPrefilter

[AddressPrefilter]
Regex=\d{1,7}
WindowCharsBeforeMatch=100
WindowCharsAfterMatch=100

For more details about these parameters, see Eduction Parameter Reference.

TIP: To use pre-filtering tasks through the C and Java Eduction APIs, you must create your Eduction engine from a configuration file. See Standalone API Usage (C) or Standalone API Usage (Java).