Python API Concepts

The Eduction SDK provides a Python API that enables your application to create an extraction engine and perform entity extractions.

This section describes the concepts used to write Python applications with the Eduction EDK.

The Python SDK consists of:

  • A Python wheel, edk-version-py3-none-any.whl (where version is the version number).

  • edk.dll (Windows) or libedk.so (MacOS and Linux), which performs the Eduction functionality.

Installation

Install the Python EDK package from the wheel by using the Python pip package manager tool:

pip install bin/edk-version-py3-none-any.whl

The package relies on the C EDK shared library (edk.dll or libedk.so). This library must be in one of the following locations:

  • a directory on the system library path

  • a location that you pass to the Python API by setting the EDKLIBPATH environment variable before you import the package from your Python code.

NOTE: You might also need additional runtime libraries to run the Eduction SDK. See Eduction SDK Package.

Package Structure

The edk module provides a Python interface to Eduction. It is split into submodules:

  • edk.api contains direct Python bindings for each function in the C EDK shared library.

  • edk.sdk contains classes that wrap the API calls to expose Eduction functionality through a more python-like interface.

  • edk.tool can be run as a script to perform simple command-line Eduction operations.

In most cases, you use the classes provided by the edk.sdk module.

Concurrency Control

Concurrency in Eduction is handled using sessions, represented by an EdkSession object.

You create an instance of an EdkEngine object with a configuration file that describes the grammars and settings that you want to use for entity extraction. You can create multiple EdkSession objects from this engine, each of which use the same grammars and settings as the parent engine. Each session maintains its state independent of others.

Character Encoding

The underlying edk.dll library and grammars require that all your input is UTF-8 encoded. The low-level edk.api functions accept data as Python bytes or bytearray objects, which must contain UTF-8 encoded text.

The higher-level edk.sdk class methods accept either raw byte data, or regular Unicode Python strings (str objects).

When you set an input stream, you can provide either a byte stream (derived from io.BufferedIOBase) or a text stream (derived from io.TextIOBase). The Eduction Python SDK classes automatically handle encoding conversion.

By default, functions return text values as the same type used for input (bytes or str). Some of the available metadata that the SDK returns represent offsets in the input. The functions return these values as byte counts when you use byte input, and as Unicode character counts when you use Unicode strings as input.