Overview
OpenText KeyView Filter SDK enables you to incorporate text extraction functionality into your own applications. It extracts text and metadata from a wide variety of file formats on numerous platforms, and can automatically recognize over 1900 document types. It supports both file-based and stream-based I/O operations, and you can perform operations in a separate process to provide an additional layer of protection.
Filter SDK is part of the KeyView suite of products. KeyView provides high-speed text extraction, conversion to web-ready HTML and well-formed XML, and high-fidelity document viewing.
Features
The KeyView Filter SDK offers the following capabilities:
-
automatic format detection
-
metadata extraction (including XMP metadata)
-
text filtering (extraction of visible and hidden text without application-specific markup)
-
character set detection and conversion
-
sub-file extraction
-
Microsoft Azure Rights Management (RMS) decryption
-
Optical Character Recognition (OCR)
The KeyView Filter SDK supports popular word processing, spreadsheet, and presentation formats, plus a wide range of less common formats. For many formats, as well as filtering body text, KeyView includes additional items such as endnotes, footnotes, headers and footers, image captions, tables, hyperlinks, comments, notes, revision history, scripts, text box content and text from charts and diagrams. You can filter documents to specific character encodings, such as UTF-8 or UTF-16.
Filter automatically recognizes the file type being filtered and uses the appropriate reader for the file format. Your application does not need to rely on file name extensions to determine file types, which can be unreliable, for example because multiple underlying formats might use the same extension.
You can also extract files embedded within files (sub-files), such as email attachments or embedded OLE objects.
In addition:
-
You can use the provided sample programs to explore the functionality of the APIs.
-
You can use filter from multiple threads, which allows you to extract text from multiple documents simultaneously.
-
Filter can read file content directly from the file system, or you can provide a custom input stream.
-
Filter can extract all the text in one go, or provide you chunks of text as soon as they are ready. Using chunks of text allows you to stop processing as soon as you have what you need. This option allows you to start downstream processing earlier, lowering the latency.
-
You can write custom document readers for formats not directly supported by KeyView.