Write a Lua Script for Post-Processing

An Eduction post-processing task runs a Lua script. Eduction passes the matches into an entry function.

There are two available entry functions to use when you process single matches:

  • processmatch. This function allows you to modify a match, change the score, or discard a non-valid match. You can use this option for most post-processing, such as checksum validation and normalization.
  • finalizematch. This function allows you to add new matches into the Eduction session. For example, you might use this option to combine existing matches, and return the combined match as a result. This function can also still perform the same changes as processmatch.

Your script must define at least one of these functions.

NOTE: If you define both processmatch and finalizematch, processmatch takes precedence.

There are also two equivalent functions for en masse processing, processmatches and finalizematches. For more information, see Process Matches En Masse.

Use ProcessMatch

The processmatch function must accept an edkMatch object (the current match) as its first argument. When you run Eduction through the API, the function can also accept a user parameters map as an optional final argument (see Pass Parameters into the Lua Script).

Eduction passes matches into the script one at a time. The script must return a Boolean value: true to keep the match or false to discard it.

The following example changes the score for every match to 0.5:

function processmatch(edkmatch)
    if edkmatch then
        -- change the score for the match
        edkmatch:setScore(0.5)
    end
    return true
end

Use FinalizeMatch

The finalizematch entry function must accept an edkMatch object (the current match) as its first argument, and a session handle as its second argument. When you run Eduction through the API, the function can also accept a user parameters map as an optional final argument (see Pass Parameters into the Lua Script).

The following example modifies an entity to append the value Esq., and injects the match back into the session.

function finalizematch(edkmatch, session)
   if edkmatch then
      local text = edkmatch:getOutputText()
      m = LuaEdkMatch:new(edkmatch:getEntityName(), text .. " Esq.", edkmatch:getOffset())
      session:injectMatch(m)
      return true
   end
   return false
end

After you inject a match, the session takes ownership of it, so you cannot use the created match in any subsequent functions.

NOTE: You cannot perform additional post-processing on injected matches. Eduction skips these matches at post-processing time, to prevent infinite loops.

Process Matches En Masse

Sometimes, you might prefer to process all the matches together. For example, you might want to increase the scores of matches that appear near other matches. It is easier to do this if you process all the matches at the same time.

To process all the matches at the same time, set the ProcessEnMasse parameter to TRUE in your Eduction configuration. When ProcessEnMasse=TRUE, Eduction passes all the matches it finds into the script together.

Your script for processing matches en masse must define a function either named processmatches, or finalizematches.

The processmatches function must accept a Lua table of edkEnMasseMatch objects as its first argument. Each of these objects represents a single match, but you must call the getMatch method to obtain an edkMatch object. You can then use the edkMatch object to manipulate the match. If you want to discard a match, call the method setOutput on the relevant edkEnMasseMatch object.

The following example demonstrates how to iterate over the elements in the table and discard any match with a score that is less than 0.5:

function processmatches(matches)
    -- example that discards matches with score < 0.5
    for k,v in ipairs (matches) do
        local edkmatch = v:getMatch()
        if edkmatch:getScore() < 0.5 then
            v:setOutput(false)
        end
    end
end

The finalizematches function must accept a Lua table of edkEnMasseMatch objects as its first argument (the same as for processmatches), and a session handle as its second argument.

When you run Eduction through the API, the processmatches or finalizematches functions can also accept a user parameters map as an optional final argument (see Pass Parameters into the Lua Script).

For information about the objects and methods that you can use in your Lua post-processing scripts, see Eduction Lua Methods Reference.

Reset a Session

You can define an additional function, resetprocessor, which Eduction calls whenever the session resets (for example, when it receives a new input stream, or more text after it has already processed a final input).

You can use this function if your script maintains a global state with details of previous matches. When you reuse the session to process another document or input buffer, you can use resetprocessor to reset the state. Eduction passes in the current user parameters (see Pass Parameters into the Lua Script) to the reset hook, if required.

The following simple example shows a script that replaces the normalized text for each match with a count. It resets the count when you reuse your EdkSession.

local count = 0

function resetprocessor (params)
    count = tonumber(params['startcount']) or 0
end

function processmatch (edkmatch)
    if edkmatch then
        count = count + 1
        edkmatch:setOutputText(count)
    end
    return true
end

Pass Parameters into the Lua Script

You can pass additional parameters into post-processing tasks that you run through the Eduction API. To add an additional parameter (to all post-processing tasks that run during the session), call the appropriate function:

  • EdkSessionSetUserParamValue in the C API.
  • ITextExtractionSession::SetUserParameter in the .NET API.
  • setUserParamValue in the Java API.

Any parameters that you set using these functions are passed into the processmatch, processmatches, finalizematch, or finalizematches function of the Lua script as a table of key-value pairs. For example:

function processmatch(edkmatch, params)
    for k,v in pairs (params) do
        --print ("Custom parameter ", k, " has value ", v)
    end
    return true
end

TIP: Some of the Lua post-processing scripts available in the Eduction packages provide optional user parameters. You can review these scripts for the available parameters, and pass them into the task as required. See Example Scripts.