Control Eduction Processing Time

Eduction matching is usually fast, but Micro Focus recommends that you set limits on processing so that your application can deal with a variety of input.

For example, for a large input text with a high density of matches, it can be time-consuming to retrieve all the matches. You must consider carefully whether your application requires all matches (which might number in the millions), or if it is enough to capture the first few hundred for any particular piece of input text.

You can control the number of matches to process by using the MaxMatchesPerDoc configuration parameter, which instructs an Eduction session to stop searching for matches after a certain number of matches have been found. To stop searching for specific entities after a certain number of matches have been found, but continue searching for other entities, set EntityMatchLimitN.

The following configuration parameters also strong affect the number of matches you might obtain:

To control the amount of time that Eduction can spend processing data, you can set the RequestTimeout configuration parameter.

In the Eduction SDK, you can use the following steps:

  1. Set the RequestTimeout configuration parameter. Alternatively:

    • For C, use EdkSessionSetRequestTimeoutPrecise on an individual session.

    • For .NET, use the ITextExtractionSession::SetRequestTimeoutPrecise method.

    • For Java, use the function TextExtractionSession::setRequestTimeoutPrecise to set a timeout for the session.

    In each case, the argument must be in milliseconds.

  2. Get the current time in epoch milliseconds, for example by using time() in C, System.DateTimeOffset.Now.ToUnixTimeMilliseconds() in .NET, or System.currentTimeMillis() in Java.

  3. In the Eduction SDK, send the time in epoch milliseconds to the session by using one of the following options:

    • For C, use EdkSessionSetStartTime, passing in the value you obtained from step 2 as the argument.

    • For .NET, use ITextExtractionSession::SetStartTimePrecise, passing in the epoch milliseconds value you obtained from the previous step as the argument.

    • For Java, use TextExtractionSession::setStartTime, passing in the epoch milliseconds value you obtained from the previous step as the argument.

  4. Obtain matches in the usual fashion, by calling EdkGetNextMatch in C, or by looping over the session object in .NET and Java.

    DEPRECATED: Do not use EdkGetNextMatchTimed in C, which is deprecated in Eduction SDK version 12.8.0 and later.

  5. Check for timeouts in the match loop. You can do this by calling EdkGetMatchTimedOut in C, or TextExtractionSession::getTimedOut in Java, or ITextExtractionSession::getTimedOut in .NET. If a timeout has occurred, you can break out of the loop, as required for your application.

You can find examples of timeout handling in the sample programs provided in the Eduction SDK release package.

TIP: If your application does significant processing before you call Eduction, and you want to use an overall application timeout, obtain the current time in epoch milliseconds at the very start of your application processing rather than waiting until just before you call the Eduction functions.