Eduction includes the following example post-processing scripts.
The checksum_luhn.lua
script verifies the checksum digit of each match using the Luhn algorithm, and reduces the score associated with the match if the checksum is wrong. The checksum_luhn_enmasse.lua
script performs checksum validation as an en masse processing task, discards incorrect matches, and alters the score of correct matches to equal the proportion of matches that have the correct checksum digit.
You can use these scripts with the number_cc.ecr
and number_sin_ca.ecr
grammar files to validate most credit card numbers.
You can use the checksum_dni_es.lua
script with the number_dni_es.ecr
grammar file to validate Spanish Documento Nacional de Identidad (national identity card) numbers.
You can use the checksum_bsn_nl.lua
script with the number_bsn_nl.ecr
grammar file to validate Dutch Citizen Service Numbers (Burgerservicenummer, or BSNs).
You can use the lat_long.lua
script with the place_lat_long.ecr
grammar file to convert and standardize the output of geographical coordinates.
You can use the datetime.lua
script with the datetime_advanced_eng.ecr
grammar file to convert and standardize the output of dates and times (and ranges) in English into a standardized format in cases where there are matches on several formats. For example, you can convert both 23/11/13 and Nov 23 2013 to 2013-11-23.
The datetime_advanced_eng.ecr
grammar file can understand English natural language dates, and relative dates such as last Saturday morning. You can provide a reference date for <today>
in the Lua script to enable normalization of relative dates into standard formats.
For date and time range matches, this script sets the normalized text to <start>/<end>
, and additionally adds STARTPOINT
and ENDPOINT
components that contain the associated dates or times. When there is a multiple date match (for example, 5th and 8th July matches as 5th July and 8th July), the script returns a comma-separated list, with a POINT component for each date.
You can use the case_filter.lua
example script to filter out matches by case, for example in personal name grammars.
To use this option, you must set MatchCase
to False
for the grammar. The script filters out any match that is not one of:
NOTE: You might need to update this script to include case mappings for uncommon non-ASCII characters. The script provides sample mappings for common Latin characters with diacritics.
|