KeyView Filter SDK Introduction
You can use the KeyView Filter SDK library by calling it from your own applications through one of its APIs. However, to help you get started it includes some sample applications, filter
and tstxtract
. which allow you to explore some of the functionality.
This section is an introductory tutorial that aims to help you to:
-
use the out-of-the-box command line tools
filter
andtstxtract
to develop your understanding of the basic capabilities and key features of the KeyView Filter SDK. -
familiarize yourself with Filter SDK output of the out-of-the-box command line tools.
-
familiarize yourself with Filter SDK configuration files.
Before you continue, you might want to read about some of the key features of KeyView in the Overview.
Setup
Sample documents for this tutorial are available on the opentext-idol github: https://github.com/opentext-idol/idol-oem-tutorials/blob/main/resources/keyview_filter. Download these sample documents, and install them to the following directory:
C:\OpenText\idol-oem-tutorials\resources
The following sections describe how to download and install KeyView. If you already have a KeyView installation, you can skip to Introduction to KeyView Filter SDK.
Activate a KeyView License Key
The KeyView SDKs require a license key, which is unique to your project.
You activate a license key on the Software Licensing and Downloads portal, on the Entitlements tab.
TIP: The filter
and tstxtract
command-line tools include a temporary license key, which means you can follow the sections of this tutorial that use these tools before you activate a license key.
Download KeyView Filter SDK components
You download KeyView from the Software Licensing and Downloads portal.
To download KeyView Filter SDK components
-
Under the Downloads tab, select your product, product name and version from the drop-down menus.
-
From the list of available files, select and download the following files:
-
KeyviewFilterSDK_23.2.0_PLATFORM.zip
, wherePLATFORM
is your software platform. For example,KeyviewFilterSDK_23.2.0_WINDOWS_X86_64.zip
. -
KeyviewFilterSDK_23.2.0_Documentation.zip
-
-
From the list of available files, select and download any available patches for your platform. For example
KeyviewFilterSDK 23.2 Patch 23.2.x
.TIP: Click on the Reference Material link in the Description column to access the link to the patch documentation, including the release notes.
NOTE: The patch ZIP package contains the patch files for all platforms.
Install KeyView Filter SDK components
The following procedure describes how to install KeyView Filter SDK from the ZIP file, and apply any patches. For more information about KeyView supported platforms and installation, see Introducing Filter SDK.
To install KeyView Filter SDK components
-
Extract the ZIP file (
KeyviewFilterSDK_VERSION_PLATFORM.zip
) to a folder of your choice. The examples in this tutorial assume that you use the following folder:C:\OpenText\KeyviewFilterSDK_23.2.0_WINDOWS_X86_64
IMPORTANT: This tutorial refers to
C:\OpenText\KeyviewFilterSDK_23.2.0_WINDOWS_X86_64
as%KEYVIEW_HOME%
. -
On Windows, you might need to install the included Visual C++ Redistributable packages. In the vcredist folder for the same Filter SDK, right-click
vcredist_2019.exe
then click Run as administrator.TIP: If the installer gives an error, it might be because you already have the required packages, in which case you can ignore the error.
-
If a patch is available, extract the patch ZIP (for example
KeyviewFilterSDK_23.2.x.yyyy.zip
) toC:\OpenText\KeyviewFilterSDK_23.2.x.yyyy
.Copy the new patch files from the appropriate
KeyViewFilterSDK_VERSION\PLATFORM
folder. Most commonly, the patch updates the files in the%KEYVIEW_HOME%\PLATFORM\bin
folder and sub-folders:C:\OpenText\KeyviewFilterSDK_23.2.0_WINDOWS_X86_64\WINDOWS_X86_64\bin
IMPORTANT: Do not mix patch updates across versions. Apply a patch update only to its intended version.
Introduction to KeyView Filter SDK
The following sections explore how to perform file format detection and metadata, text, and subfile extraction by using the sample programs filter
and tstxtract
.
Run filter
The sample program filter
is a command-line tool that demonstrates the capabilities of the Filter API.
The source code for filter
is installed in the %KEYVIEW_HOME%\samples\filter
folder. The KeyView installation also includes a pre-built binary, located in %KEYVIEW_HOME%\PLATFORM\bin
.
> cd C:\OpenText\KeyviewFilterSDK_23.2.0_WINDOWS_X86_64\WINDOWS_X86_64\bin
> filter
WARNING: filter is a sample program only and is not for production use
Usage: filter [options] inputfile outputfile
options are:
[-m] get document metadata
[-c] do not create a separate process for filtering
[-e] run filtering in stream-based mode
[-h] add headers/footers
[-d] get the format information for a file
[-k] create a separate process for detection
[-l] do not create a separate process for detection
[-L] Enable Log in Kvoop
[-LN] Disable Log in Kvoop
[-AF] Add input file name to Kvoop Log
[-rm] Include revision marks
[-sh] Include hidden text from Word
[-nc] No comments from Word or PowerPoint
[-x xmlconfigfile] Specify the configuration file for XML reader
[-z tmpdir] Specify a directory where temp files are created
[-t timeout] Specify the number of seconds after which filter should time out (only for oop use)
[-ps srcPassword] Specify the source document password
[-pdfauto] Specify Logical Order output for the input PDF file
[-pdfltr] Specify that the input PDF file is left-to-right dominate for logical order
[-pdfrtl] Specify that the input PDF file is right-to-left dominate for logical order
[-pdfraw] Use config api to force raw order mode for pdfsr reader
By default, this sample program uses an embedded trial license. If the environment variable
KV_SAMPLE_PROGRAM_LICENSE_FROM_FILEPATH is set, this file will be read and its content will
be passed as a license to KeyView. This mechanism exists only to allow this program to be used
in testing after the expiry of its trial license, and should not be done in production code.
For more information, see Sample Programs.
Format Detection
The KeyView Filter SDK automatically recognizes the file type being filtered. Your application does not need to rely on the file extension to determine the file types, which can be unreliable in some cases.
Filter also provides other general file attributes like format version and encryption status as part of automatic format detection.
To detect formats
-
Use the
filter -d
command.Copy> cd C:\OpenText\KeyviewFilterSDK_23.2.0_WINDOWS_X86_64\WINDOWS_X86_64\bin
> filter -d "..\..\..\idol-oem-tutorials\resources\keyview_filter\KeyViewFilterSDK_12.13.0_ReleaseNotes_en.pdf" detect
WARNING: filter is a sample program only and is not for production use
The file ..\..\..\idol-oem-tutorials\resources\keyview_filter\KeyViewFilterSDK_12.13.0_ReleaseNotes_en.pdf
File Class: 1
Format Name (Number): PDF_Fmt (230)
Version: 1400
Attributes: 0
Description: Adobe PDF (Portable Document Format)
MIME Type: application/pdf
KWAD: error code returned is KVERR_Success
NOTE: The KVERR_Success
error code is a positive result.
KeyView correctly identified this file as PDF_Fmt, which is the KeyView format name for PDF.
File Class: 1 refers to the adWORDPROCESSOR
category.
For more information about format class/category and format id/number, see Supported Formats.
For the PDF file format, the Major Version: 1400 refers to PDF1.4. And there are zero attributes applied to KeyViewFilterSDK_12.13.0_ReleaseNotes_en.pdf
.
NOTE: The class and format ID assignment scheme was created for KeyView. When applicable the Supported Formats documentation notes the MIME type, but not all file formats have MIME types.
You can now try filter -d
with your own test files.
Metadata Extraction
Documents can contain different types of metadata. For example, a document might have a Title and an Author, an image might have a width and a height, and an email has From, To, Subject, and so on.
There are different ways to store this metadata, for example by using a standard mechanism like XMP, or by using something format-specific.
KeyView reports all these types of metadata through a common interface, so that you can use the same method to obtain it, regardless of the underlying storage mechanism.
To perform general metadata extraction
-
Use the
filter -m
command. For example:Copy> cd C:\OpenText\KeyviewFilterSDK_23.2.0_WINDOWS_X86_64\WINDOWS_X86_64\bin
> filter -m "..\..\..\idol-oem-tutorials\resources\keyview_filter\KeyViewFilterSDK_12.13.0_ReleaseNotes_en.pdf" metadata
WARNING: filter is a sample program only and is not for production use
filter:..\..\..\idol-oem-tutorials\resources\KeyViewFilterSDK_12.13.0_ReleaseNotes_en.pdf to metadata
filter: error code returned is KVError_SuccessNOTE: The
KVERR_Success
error code is a positive result where the destination metadata file contains the output.Use a UTF-8 capable text editor to view the output, in case the test document includes complex character sets.
Copy4000 "Title": (String) "IDOL KeyView Filter SDK 12.13.0 Release Notes"
2000 "Author": (String) "Micro Focus"
0 "Create_DTM": (WinFileTime) "2022-10-21T14:21:17Z"
1000 "Created": (WinFileTime) "2022-10-21T14:21:17Z"
0 "LastSave_DTM": (WinFileTime) "2022-10-21T14:21:17Z"
1001 "Modified": (WinFileTime) "2022-10-21T14:21:17Z"
5000 "PageCount": (Int64) 10
0 "AppName": (String) "madbuild"
2001 "Application": (String) "madbuild"
Open ..\..\..\idol-oem-tutorials\resources\KeyViewFilterSDK_12.13.0_ReleaseNotes_en.pdf
in Adobe Acrobat Reader. Go to File > Properties and compare what you see to the output from filter -m
.
You can now try filter -m
with your own test files.
TIP: As well as containing their own metadata, some formats contain metadata about each of the subfiles they contain. For example, each email message in an email archive has From
and To
metadata. You can obtain this metadata by using the -lm
option in tstxtract
.
Text Extraction
Filter SDK supports the extraction of different types of text (also known as text filtering), including visible text and hidden text.
The following example shows how to extract the visible text, which is the text you easily see when you edit, view or print a document.
>> filter "..\..\..\idol-oem-tutorials\resources\keyview_filter\KeyViewFilterSDK_12.13.0_ReleaseNotes_en.pdf" text
filter: ..\..\..\idol-oem-tutorials\resources\keyview_filter\KeyViewFilterSDK_12.13.0_ReleaseNotes_en.pdf to text
> cd C:\OpenText\KeyviewFilterSDK_23.2.0_WINDOWS_X86_64\WINDOWS_X86_64\bin
filter: error code returned is KVERR_Success
NOTE: The KVERR_Success
error code is a positive result where the destination text file contains the output.
Use a UTF-8 capable text editor to view the output, in case the test document includes complex character sets.
Open ..\..\..\idol-oem-tutorials\resources\keyview_filter\KeyViewFilterSDK_12.13.0_ReleaseNotes_en.pdf
in Adobe Acrobat Reader. You can see that the filter output contains all of the visible text with the formatting resembling the original document.
You can now try filter
with your own test files.
Subfile Extraction
Many file formats can contain other files. Archive files such as ZIPs, and email attachments are the most obvious of these. However, many other file formats can contain subfiles, and KeyView can automatically handle and extract these subfiles for processing.
The following examples perform subfile extraction on the Filter SDK Java API KeyView.jar
.
Run tstxtract
The sample program tstxtract
is a command-line tool that demonstrates the KeyView Extract API capabilities.
The source code for tstxtract
is installed in the directory %KEYVIEW_HOME%\samples\tstxtract
. A pre-built binary is installed in %KEYVIEW_HOME%\PLATFORM\bin
.
>> cd C:\OpenText\KeyviewFilterSDK_23.2.0_WINDOWS_X86_64\WINDOWS_X86_64\bin
> tstxtract
Usage: [options] <source file> <output directory> <keyview directory - optional>
Example: tstxtract inputfile outputdir
Example: tstxtract -l logfile -lm inputfile outputdir
When input file is a PST, please use absolute path
options are :
[-c charset ] specify target character set (eg. "KVCS_SJIS")
[-cf credfile1,credfile2,... ] specify input credential file(s) (eg. private key)
[-l logfile ] give path and file name for logfile
[-lm ] get default metadata and output to logfile
[-i ] run as in-process
[-r ] recursively extract subfiles that "needs extraction" to outputdir
[-msg ] extract mail subfiles as native email (eg. extract MSG from PST)
[-f ] extract mail subfiles in formatted (HTML or RTF or MHT) text
[-e ] extract as stream with custom input and custom output stream
[-p password1,password2,... ] specify password(s) to input file or the credential file(s)
[-t ] preserve timestamp of embedded files when possible
[-h ] extract hidden text
[-to timeInSecs ] set kvoop extraction timeout (Filter SDK only)
By default, this sample program uses an embedded trial license. If the environment variable
KV_SAMPLE_PROGRAM_LICENSE_FROM_FILEPATH is set, this file will be read and its content will
be passed as a license to KeyView. This mechanism exists only to allow this program to be used
in testing after the expiry of its trial license, and should not be done in production code.
Perform Subfile Extraction
When you run tstxtract
, you must choose an extraction destination. The following example uses a created output folder called _extract
, but you can choose any location and name for the extraction destination directory.
> cd C:\OpenText\KeyviewFilterSDK_23.2.0_WINDOWS_X86_64\WINDOWS_X86_64\bin
> mkdir _extract
> tstxtract ..\..\javaapi\KeyView.jar _extract
File ..\..\javaapi\KeyView.jar has 108 sub-files, charset: 0, format: 999
tstxtract return code: 0
NOTE: Return code: 0 is a positive result, equivalent to KVERR_Success
.
> dir _extract
Volume in drive D is DDrive
Volume Serial Number is 66F6-7BE6
Directory of C:\OpenText\KeyviewFilterSDK_23.2.0\WINDOWS_X86_64\bin\_extract
07/01/2022 09:16 AM <DIR> .
07/01/2022 09:16 AM <DIR> ..
07/01/2022 09:16 AM <DIR> com
07/01/2022 09:16 AM 4,960 htmlinfo.properties
07/01/2022 09:16 AM <DIR> META-INF
07/01/2022 09:16 AM 4,000 xmlinfo.properties
2 File(s) 8,960 bytes
4 Dir(s) 1,461,157,371,904 bytes free
In this example, KeyView extracts 108 subfiles, preserving the directory structure.
NOTE: In some cases Filter SDK automatically generates a file name for extracted files.
NOTE: Image extraction is not enabled by default. For details about how to enable image extraction, see Extract Images. This topic also introduces the filter\formats.ini
configuration file.
You can now try tstxtract
with your own test files. Remember to delete the extracted contents of the _extract
folder between each iteration.
Conclusion
In this tutorial, you used Filter SDK to automatically detect the file format, perform basic metadata and text extraction, and extract subfiles.
Next, you can try additional tutorials, such as the C API Programming Tutorial to look at how to use KeyView from your own application by using its C API.