.NET API Concepts
The Eduction SDK provides a .NET API that enables your application to create an extraction engine and perform entity extractions.
This section describes the concepts used to write .NET applications with the Eduction EDK.
The .NET SDK consists of:
EductionDotNet.dll
, which contains the Eduction .NET class library.edk.dll
(Windows) orlibedk.so
(MacOS and Linux), which performs the Eduction functionality.
NOTE: You might also need additional runtime libraries to run the Eduction SDK. See Eduction SDK Package.
Concurrency Control
Concurrency in Eduction is handled using sessions, represented by an ITextExtractionSession
object.
You initialize an instance of an ITextExtractionEngine
object with a configuration file that describes the grammars and settings that you want to use for entity extraction. You can create multiple ITextExtractionSession
objects from this engine, each of which use the same grammars and settings as the parent engine. Each session maintains its state independent of others.
Character Encoding
The underlying edk.dll
and grammars assume that all your input is UTF-8 encoded. The Eduction .NET SDK functions that accept System.string
automatically handle conversion from UTF-16 to UTF-8. However, functions that accept a System.IO.Stream
(for example Eduction.ITextExtractionSession.SetInputStream
) require the byte data in the stream to be UTF-8.
Some of the available metadata that the SDK returns represent byte counts or offsets. These values are correct for the UTF-8 representation of the matched texts. Character counts and offsets are independent of the encoding.