C API Programming Tutorial
The KeyView Filter SDK allows you to embed KeyView functionality into other services.
This tutorial helps you to:
-
familiarize yourself with the Filter SDK C API
-
create a sample program that replicates a common use case of the Filter SDK
Setup
Resources
You must download the following resources before you continue:
-
source code for this tutorial: tutorial_file.c
-
optional sample files for detection and various extraction options: sample files
Environment and Compilers
To create a program that uses KeyView, you need to install a supported compiler, and use it to build and link your program. See Supported Compilers.
NOTE: When building with the Visual Studio compiler, you must ensure you open the correct command prompt for the installed version. For example, if you install the WINDOWS_X86_64
version of KeyView, ensure you use x64 Native Tools Command Prompt.
License Key
You need a KeyView license key to proceed with this tutorial.
API Setup
Building the sample program
This tutorial explains how to gradually build a working program that replicates a common use case of the KeyView Filter SDK. Sample code for this program is provided (see tutorial_file.c), but you must modify the fpInit
call for the sample to compile.
Update the tutorial.h
file with the following information:
-
Replace the value of
YOUR_LICENSE_KEY
with your license.
-
Change the
YOUR_BIN_DIR
variable to the location of the KeyView bin directory.
NOTE: For compilation tips, refer to the tutorial_file.c
source code.
Linking Against KeyView Filter SDK
The easiest way to get access to KeyView functionality is to link against the kvfilter
shared library and place your executable in the same directory as the KeyView binaries (that is, the directory containing kvfilter.so
or kvfilter.dll
).
TIP: Loading shared libraries can expose your application to attacks. For advice on avoiding DLL preloading attacks, see Security Best Practices.
-
Linking using GCC
On Linux, link against
kvfilter.so
, and also pass in the–rpath $ORIGIN
option to the linker. For example:%KEYVIEW_HOME%/LINUX_X86_64/bin/kvfilter.so -Wl,-rpath,'$ORIGIN'
NOTE: When you call it from inside a makefile, you might need to escape
$ORIGIN
to$$ORIGIN
.
-
Linking using Clang
On MacOS, link against
kvfilter.so
, and also pass in the–rpath @loader_path
option to the linker. For example:%KEYVIEW_HOME%/MACOS_X86_64/bin/kvfilter.so -Wl,-rpath,@loader_path
-
Linking using Visual Studio
On Windows, link against the import library for
kvfilter.dll
. This library is provided as part of the KeyView Filter SDK, under{platform}/lib/kvfilter.lib
.
Loading the Filter Interface
Now that you can access KeyView functionality, you must include the required headers and load the interface functions from the kvfilter library, using KV_GetFilterInterfaceEx()
.
#include "kvtypes.h"
#include "kvfilt.h"
KVFltInterfaceEx filter;
KVErrorCode error = KV_GetFilterInterfaceEx(&filter, KVFLTINTERFACE_REVISION);
if(error != KVError_Success)
{
//return error;
}
This call fills out the interface structure, which contains the function pointers for the rest of Filter's functionality.
Like most KeyView functions, KV_GetFilterInterfaceEx()
returns a KVErrorCode
. If this is KVError_Success
, then the function succeeded. Otherwise, the error code indicates the problem that occurred. The rest of this tutorial assumes that you check the error code after each function call.
Now that you have successfully loaded the interface, you are ready to use the KeyView Filter SDK.
Creating a Filter Session
All KeyView functionality requires a session, which you must initialize at the start of processing, and shut down at the end of processing.
KVFilterSession session = NULL;
KVFilterInitOptions options;
KVStructInit(&options);
options.outputCharSet = KVCS_UTF8;
error = filter.fpInit(
"/path/to/keyview/bin",
YOUR_LICENSE_KEY,
&options,
&session);
//Use KeyView
filter.fpShutdown(session);
You initialize by using the function fpInit(), which you must provide with the path to the KeyView bin folder, and your license key.
This function also takes a pointer to a KVFilterInitOptions
structure, which you must initialize by using KVStructInit()
. This macro ensures that a struct is correctly set up for use with the KeyView interface, including version information for backwards compatibility. Any KeyView struct that contains a KVStructHead member must be initialized with the KVStructInit()
macro.
TIP: Performance considerations:
-
Session Lifetime. You can process multiple files in a single session, which might improve performance by reducing costs associated with start-up and shutdown.
-
Multi-threading. To maximize throughput when processing multiple files, you can call KeyView from multiple threads. All KeyView functions are thread-safe when called in this manner. Each thread using KeyView must create its own session by calling
fpInit()
. You must not share filter sessions between threads.
TIP: Security considerations:
-
Privilege Reduction. By default, KeyView performs most of its operations out-of-process, creating a separate process to parse file data. This protects your main application from the effects of rare problems like memory leaks or crashes. You can include additional protection by running KeyView with reduced privileges. See Run KeyView with Reduced Privileges.
-
Temp Directory. While processing, KeyView might place sensitive data in the temporary directory. You might want to consider protecting the temporary directory. See Protect the Temporary Directory.
Now that you have set up the API, you can perform KeyView Filter functionality on documents.
Opening a Document
KeyView functions operate on a KVDocument
object, which is a representation of a document that is agnostic as to where the document actually lives. You can create a KVDocument
from a file on disk by using fpOpenDocumentFromFile
. You must close the KVDocument
after use.
KVDocument document = NULL;
error = filter.fpOpenDocumentFromFile(session, pathToInputFile, &document);
//Pass document to KeyView functions
filter.fpCloseDocument(document);
Filter a File
One of the most important features of KeyView is filtering text from a document. This section shows you how to get text from a KVDocument
object, and output it to a file on disk.
Filtering Text
You can filter text to an output file by using the fpFilterToFile() function.
error = filter.fpFilterToFile(document, pathToOutputFile);
TIP: Partial Filtering. The fpFilterToFile()
function filters the entire file in one go, but you might want to filter only part of the file, or filter the file in chunks. The advanced tutorial covers how to do partial filtering.
TIP: Mail Files. Mail files, such as EML or MSG, are considered a form of container, and you cannot filter them directly. This tutorial covers how to filter mail files later, in Extracting Subfiles.
Filtering Hidden Information
KeyView provides a number of options that control what text to output, and how to convert or display that text. A common requirement of KeyView is to display as much text as possible, including text that is not normally visible in the document, such as hidden cells or slides, or ancillary text like comments or notes. You can display this text by enabling the hidden text option in fpSetConfig().
error = filter.fpSetConfig(session, KVFLT_SHOWHIDDENTEXT, TRUE, NULL);
Detecting and Using File Format Information
KeyView enables you to reliably determine the file format of a huge range of documents. It does this by analyzing the internal structure and content of the file, rather than relying on file names or extensions. Detection prioritizes both accuracy and speed, only processing as much of the file as necessary to rule out false positives.
Detecting the File Format
File format detection functionality is exposed through the API function fpGetDocInfo().
ADDOCINFO adInfo;
error = filter.fpGetDocInfo(document, &adInfo);
TIP: Source Code Identification. KeyView can optionally detect source code, attempting to identify the programming language that it is written in. You can learn more in Source Code Identification.
Checking if a File is Supported
KeyView provides a convenience function, fpCanFilter() that performs detection and determines if you can pass the file to fpFilter().
error = filter.fpCanFilter(document);
If this function does not return KVError_Success
, the error returned explains why KeyView cannot filter the format, for example, because KeyView could not determine the format, because the file does not exist, or because the format is not supported for filtering.
While this step is not strictly necessary, it can simplify many workflows.
Retrieving Metadata
File formats can contain a variety of different metadata, and KeyView makes it easy to access all of this information. KeyView retrieves metadata from various sources in a file, such as:
-
Format-specific standard metadata
-
User-provided custom metadata
-
Exif tags
-
XMP elements
-
MIP Labels
Getting the Metadata List
You can retrieve metadata elements by using fpGetMetadataList(). This function fills out the KVMetadataList structure, which you must free by using its fpFree
function.
const KVMetadataList* metadataList = NULL;
error = filter.fpGetMetadataList(document, &metadataList);
//Iterate through metadata
metadataList->fpFree(metadataList);
Iterating Through the List
You can retrieve individual metadata elements by iterating through the metadata list using the fpGetNext()
function in KVMetadataList
, which fills out the KVMetadataElement structure. The information that this structure returns is valid only while the session is still alive, and becomes invalid after you call fpFree()
. The end of the list is indicated by the retrieved element being NULL.
while(1)
{
const KVMetadataElement* element = NULL;
error = metadataList->fpGetNext(metadataList, &element);
if(error != KVError_Success)
{
//Handle error
}
if(element == NULL)
{
break;
}
//Process metadata element
}
Interpreting a Metadata Element
Each metadata element is conceptually represented as a key-value pair, where pKey
is the name of the metadata key, and pValue
is the value of that piece of metadata. To know the type of the metadata object the pValue
points to, you must first consult the eType
member. Strings are output in the character set that you requested in the call to fpInit().
switch (element->eType)
{
case KVMetadataValue_Bool:
{
BOOL value = *(BOOL*)element->pValue;
//Process Bool value
break;
}
case KVMetadataValue_Int64:
{
int64_t value = *(int64_t*)element->pValue;
//Process Int64 value
break;
}
case KVMetadataValue_Double:
{
double value = *(double*)element->pValue;
//Process Doube value
break;
}
case KVMetadataValue_WinFileTime:
{
int64_t value = *(int64_t*)element->pValue;
//Process WinFileTime value
break;
}
case KVMetadataValue_String:
{
KVString value = *(KVString*)element->pValue;
//Process String value
break;
}
case KVMetadataValue_Binary:
{
KVBinaryData value = *(KVBinaryData*)element->pValue;
//Process Binary value
break;
}
case KVMetadataValue_MIPLabel:
{
KVMIPLabel value = *(KVMIPLabel*)element->pValue;
//Process MIPLabel value
break;
}
default:
//Handle unrecognised type
break;
}
Standardized Metadata Elements
Different file formats can store the same piece of information in different ways. For example, one file format might call the width of the image width, another image_width, and another x_size. This behavior is often unhelpful, because you then need to maintain a list of fields that correspond to a particular piece of information. KeyView solves this problem by standardizing certain metadata fields. See Understanding Metadata Fields in KeyView.
Extracting Subfiles
KeyView Filter SDK allows you to access the subfiles of a document, from both pure containers (such as ZIP or TAR files) and from documents embedded inside other files (such as an Excel spreadsheet embedded in a Word document).
Loading the Extract Interface
You can load the Extract interface by including the kvxtract.h
header, and calling the fpGetExtractInterface() function in the kvfilter
shared library.
#include "kvxtract.h"
KVExtractInterfaceRec extract;
KVStructInit(&extract);
error = filter.fpGetExtractInterface(session, &extract);
The interface function takes the filter session you created earlier, as well as the extraction interface to fill out.
Opening a Container
You must open a container file before you can access the subfiles. You open the container by using the fpOpenFileFromFilterSession() function. This function creates a file-specific handle that you can use with the other functions in the extract interface. You must close this handle after use.
void* fileHandle = NULL;
KVOpenFileArgRec openArg;
KVStructInit(&openArg);
openArg.extractDir = "path/to/extract/dir";
openArg.document = document;
error = extract.fpOpenFileFromFilterSession(session, &openArg, &fileHandle);
//Use File Handle
extract.fpCloseFile(fileHandle);
You can then get information about the container itself by using the function fpGetMainFileInfo(). Most importantly, this tells you the number of subfiles. You must free this structure after use.
KVMainFileInfo fileInfo = NULL;
error = extract.fpGetMainFileInfo(fileHandle, fileInfo);
//Use main file info
extract.fpFreeStruct(fileHandle, fileInfo);
Extracting Subfiles
Before you extract the subfile itself, you can first get some information about the subfile. You get this information by calling the fpGetSubFileInfo() function, using the index to identify the subfile. You must free this structure after use.
for(int ii = 0; ii < fileInfo->numSubFiles; ++ii)
{
KVSubFileInfo subFileInfo = NULL;
error = extract.fpGetSubFileInfo(fileHandle, ii, &subFileInfo);
//Use sub file info
extract.fpFreeStruct(fileHandle, subFileInfo);
}
After you have this subfile info, you can use it to construct the necessary arguments for extraction.
KVSubFileExtractInfo extractInfo = NULL;
KVExtractSubFileArgRec extractArg;
KVStructInit(&extractArg);
if(subFileInfo->subFileType == KVSubFileType_Folder ||
subFileInfo->subFileType == KVSubFileType_External)
{
goto skipfile;
}
extractArg.index = index;
extractArg.filePath = subFileInfo->subFileName;
extractArg.extractionFlag =
KVExtractionFlag_CreateDir |
KVExtractionFlag_Overwrite |
KVExtractionFlag_GetFormattedBody |
KVExtractionFlag_SanitizeAbsolutePaths;
error = extract.fpExtractSubFile(fileHandle, &extractArg, &extractInfo);
//Do more processing, such as filtering the sub file
extract.fpFreeStruct(fileHandle, extractInfo);
The fpExtractSubFile() function fills out the KVSubFileExtractInfo
pointer, which tells you more about what the function actually did. For example, it tells you the location it extracted the file to.
TIP: Mail Files. KeyView treats mail files as containers, where the first subfile is the contents of the mail file, and subsequent subfiles are the attachments.
NOTE: Security. KVExtractionFlag_SanitizeAbsolutePaths
mitigates against certain path traversal attacks. See Sanitize Absolute Paths.
By default, KeyView does not extract images when extracting subfiles. You can enable image extraction by using the fpSetConfig() function. This option is set globally for the session, so you can set it outside of the loop that you use to process files.
error = filter.fpSetConfig(session, KVFLT_EXTRACTIMAGES, TRUE, NULL);
Retrieving Mail Metadata
You can retrieve mail metadata for a particular subfile by using the function fpGetSubFileMetadataList(). This function fills out the same KVMetadataList
structure that you used in Retrieving Metadata, and can be handled in the same way. You must initialize KVGetSubFileMetadataListArg by using KVStructInit()
.
const KVMetadataList* metadataList = NULL;
KVGetSubfileMetadataListArgRec metaArgs;
KVStructInit(&metaArgs);
metaArgs.index = index;
metaArgs.trgCharset = KVCS_UTF8;
error = extract.fpGetSubfileMetadataList(fileHandle, &metaArg, &metadataList);
//Process metadata using metadataList->fpGetNext()
metadataList->fpFree(metadataList);
Conclusion
You have now built a basic sample program, processing documents using file mode. To learn about more advanced features, such as processing files in stream mode, take a look at the advanced tutorial.