Python API Concepts
The Eduction SDK provides a Python API that enables your application to create an extraction engine and perform entity extractions.
This section describes the concepts used to write Python applications with the Eduction EDK.
The Python SDK consists of:
-
A Python wheel,
edk-version-py3-none-any.whl
(whereversion
is the version number). -
edk.dll
(Windows) orlibedk.so
(MacOS and Linux), which performs the Eduction functionality.
Installation
Install the Python EDK package from the wheel by using the Python pip
package manager tool:
pip install bin/edk-version-py3-none-any.whl
The package relies on the C EDK shared library (edk.dll
or libedk.so
). This library must be in one of the following locations:
-
a directory on the system library path
-
a location that you pass to the Python API by setting the
EDKLIBPATH
environment variable before you import the package from your Python code.
NOTE: You might also need additional runtime libraries to run the Eduction SDK. See Eduction SDK Package.
Package Structure
The edk
module provides a Python interface to Eduction. It is split into submodules:
-
edk.api
contains direct Python bindings for each function in the C EDK shared library. -
edk.sdk
contains classes that wrap the API calls to expose Eduction functionality through a more python-like interface. -
edk.tool
can be run as a script to perform simple command-line Eduction operations.
In most cases, you use the classes provided by the edk.sdk
module.
Concurrency Control
Concurrency in Eduction is handled using sessions, represented by an EdkSession
object.
You create an instance of an EdkEngine
object with a configuration file that describes the grammars and settings that you want to use for entity extraction. You can create multiple EdkSession
objects from this engine, each of which use the same grammars and settings as the parent engine. Each session maintains its state independent of others.
Character Encoding
The underlying edk.dll
library and grammars require that all your input is UTF-8 encoded. The low-level edk.api
functions accept data as Python bytes
or bytearray
objects, which must contain UTF-8 encoded text.
The higher-level edk.sdk
class methods accept either raw byte data, or regular Unicode Python strings (str
objects).
When you set an input stream, you can provide either a byte stream (derived from io.BufferedIOBase
) or a text stream (derived from io.TextIOBase
). The Eduction Python SDK classes automatically handle encoding conversion.
By default, functions return text values as the same type used for input (bytes
or str
). Some of the available metadata that the SDK returns represent offsets in the input. The functions return these values as byte counts when you use byte input, and as Unicode character counts when you use Unicode strings as input.