Write a Lua Script for Post-Processing
An Eductionpost-processing task runs a Lua script. Eduction passes the matches into an entry function.
There are two available entry functions to use when you process single matches:
processmatch
. This function allows you to modify a match, change the score, or discard a non-valid match. You can use this option for most post-processing, such as checksum validation and normalization.finalizematch
. This function allows you to add new matches into the Eduction session. For example, you might use this option to combine existing matches, and return the combined match as a result. This function can also still perform the same changes asprocessmatch
.
Your script must define at least one of these functions.
NOTE: If you define both processmatch
and finalizematch
, processmatch
takes precedence.
There are also two equivalent functions for en masse processing, processmatches
and finalizematches
. For more information, see Process Matches En Masse.
Use ProcessMatch
The processmatch
function must accept a single argument, an edkMatch
object. Eduction passes matches into the script one at a time. The script must return a Boolean value: true
to keep the match or false
to discard it.
The following example changes the score for every match to 0.5
:
function processmatch(edkmatch) if edkmatch then -- change the score for the match edkmatch:setScore(0.5) end return true end
Use FinalizeMatch
The finalizematch
entry function must accept two arguments: an edkMatch
object (the current match), and a session handle. It can also optionally accept a user parameters map.
The following example modifies an entity to append the value Esq.
, and injects the match back into the session.
function finalizematch(edkmatch, session) if edkmatch then local text = edkmatch:getOutputText() m = LuaEdkMatch:new(edkmatch:getEntityName(), text .. " Esq.", edkmatch:getOffset()) session:injectMatch(m) return true end return false end
After you inject a match, the session takes ownership of it, so you cannot use the created match in any subsequent functions.
NOTE: You cannot perform additional post-processing on injected matches. Eduction skips these matches at post-processing time, to prevent infinite loops.
Process Matches En Masse
Sometimes, you might prefer to process all the matches together. For example, you might want to increase the scores of matches that appear near other matches. It is easier to do this if you process all the matches at the same time.
To process all the matches at the same time, set the ProcessEnMasse
parameter to TRUE
in your Eduction configuration. When ProcessEnMasse=TRUE
, Eduction passes all the matches it finds into the script together.
Your script for processing matches en masse must define a function either named processmatches
, or finalizematches
.
The processmatches
function must take a single argument, a Lua table of edkEnMasseMatch
objects. Each of these objects represents a single match, but you must call the getMatch
method to obtain an edkMatch
object. You can then use the edkMatch
object to manipulate the match. If you want to discard a match, call the method setOutput
on the relevant edkEnMasseMatch
object.
The following example demonstrates how to iterate over the elements in the table and discard any match with a score that is less than 0.5
:
function processmatches(matches) -- example that discards matches with score < 0.5 for k,v in ipairs (matches) do local edkmatch = v:getMatch() if edkmatch:getScore() < 0.5 then v:setOutput(false) end end end
The finalizematches
function must take two arguments, a Lua table of edkEnMasseMatch
objects (the same as for processmatches
), and a session handle. It can also optionally accept a user parameters map.
For information about the objects and methods that you can use in your Lua post-processing scripts, see Eduction Lua Methods Reference.
Pass Parameters into the Lua Script
You can pass additional parameters into post-processing tasks that you run through the Eduction API. To add an additional parameter (to all post-processing tasks that run during the session), call the appropriate function:
EdkSessionSetUserParamValue
in the C API.-
ITextExtractionSession::SetUserParameter
in the .NET API. setUserParamValue
in the Java API.
Any parameters that you set using these functions are passed into the processmatch
or processmatches
function of the Lua script as a table of key-value pairs. For example:
function processmatch(edkmatch, params) for k,v in pairs (params) do --print ("Custom parameter ", k, " has value ", v) end return true end