Reduce Initialization Overhead

This section provides some general advice about reducing overhead, for all Eduction SDKs.

EDK Engines

When Eduction creates and prepares a new EDK engine instance, it loads the required grammar files from disk. If you use an XML grammar rather than a compiled ECR file, it also compiles the grammar in memory. This process can be comparatively slow, especially if the grammar files are large.

To reduce this initialization overhead, OpenText recommends that you initialize one EDK engine instance, and reuse it for the lifetime of the application. You can create as many sessions as required from a single engine, for example to allow parallel processing in multiple threads.

Per-Session Settings in an Engine

All sessions created from the engine start with the same Eduction settings, in particular the available entities. You can adjust some settings per-session, including the maximum number of matches for individual entities.

You can set the maximum number of matches to zero to effectively turn off individual entities for a particular session. Using the available per-session settings can reduce the number of distinct EDK engines that your application must create.

TIP: You can adjust parameters by providing a configuration snippet, or by setting the parameters individually. To find out which parameters you can modify on a per-session basis, see Eduction Parameter Reference.

Grammar Compression

When engine creation time is particularly critical to your application, you might want to consider decompressing your ECR files.

ECR grammars use gzip compression to reduce their footprint on disk. Each time the grammar loads, decompression adds a small but measurable overhead to the loading time. You can reduce this overhead by manually decompressing the ECR file with a tool such as gunzip before loading it into Eduction.

EDK Sessions

EDK sessions are relatively lightweight to create, compared to an EDK engine. However, there are still some non-trivial steps, such as loading post-processing scripts.

In all APIs, you can reset an individual session after it receives the final input text. In applications where you repeatedly run Eduction on many small, separate input text fragments, it is significantly quicker to reset and reuse an existing session than to destroy the old session and create a new one.