Configure a Pre-Filter Task

For each pre-filter task that you want to configure, you set:

  • a regular expression that specifies how to find potential matches, or a resource file that provides a dictionary of terms to use for fast matching.

  • the amount of text Eduction must use on either side of the potential match to find the more detailed match.

NOTE: Eduction runs all your configured pre-filtering tasks for all input text, so ensure that your pre-filter task applies to all your configured grammars and entities. Use a different configuration for any entities that you do not want to pre-filter.

To configure a pre-filter task

  1. In the [Eduction] section, add a PreFilterTaskN parameter, where N is a number starting from 0 for the first task. Set this parameter to the name of a configuration section where you define your pre-filter task.

  2. Create the new configuration section.

  3. Set one of the following parameters:

    • Regex to a regular expression value that finds potential matches in your text.

    • ResourceFile to the name of a DPF, JSON or EJR grammar file that contains the dictionary of terms to use for pre-matching.

      TIP: If you use an EJR grammar, you can also set the Entities parameter to restrict this option to a set of entities in a grammar. For example, you might want to use landmark entities only to find your match windows.

      For more information about available EJR grammar files, refer to the Eduction Grammars User Guide.

  4. Set WindowCharsBeforeMatch and WindowCharsAfterMatch to the number of characters before and after the potential match segment to use as the match window.

  5. Optionally set other parameters to exclude non-valid values or end processing early in certain conditions, such as Exclusion, InvalidRegexAfterMatch, InvalidRegexBeforeMatch, and PrefilterMaxReturnedBytes. For more information, see Eduction Parameter Reference.

  6. Save and close your configuration file.

For example:

[Eduction]
PrefilterTask0=AddressPrefilter
PrefilterTask1=DrivingLandmarkFilter

[AddressPrefilter]
Regex=\d{1,7}
WindowCharsBeforeMatch=100
WindowCharsAfterMatch=100

[DrivingLandmarkFilter]
ResourceFile=driving.ejr
Entities=pii/driving/landmark/??
WindowCharsBeforeMatch=100
WindowCharsAfterMatch=100

For more details about these parameters, see Eduction Parameter Reference.

TIP: To use pre-filtering tasks through the C and Java Eduction APIs, you must create your Eduction engine from a configuration file. See Standalone API Usage (C) or Standalone API Usage (Java).