C API Programming Tutorial

The KeyView Filter SDK allows you to embed KeyView functionality into other services.

This tutorial helps you to:

  • familiarize yourself with the Filter SDK C API

  • create a sample program that replicates a common use case of the Filter SDK

Setup

Resources

You must download the following resources before you continue:

Environment and Compilers

To create a program that uses KeyView, you need to install a supported compiler, and use it to build and link your program. See Supported Compilers.

NOTE: When building with the Visual Studio compiler, you must ensure you open the correct command prompt for the installed version. For example, if you install the WINDOWS_X86_64 version of KeyView, ensure you use x64 Native Tools Command Prompt.

License Key

You need a KeyView license key to proceed with this tutorial.

API Setup

Building the sample program

This tutorial explains how to gradually build a working program that replicates a common use case of the KeyView Filter SDK. Sample code for this program is provided (see tutorial_file.c), but you must modify the fpInit call for the sample to compile.

Update the tutorial.h file with the following information:

  • Replace the value of YOUR_LICENSE_KEY with your license.

  • Change the YOUR_BIN_DIR variable to the location of the KeyView bin directory.

NOTE: For compilation tips, refer to the tutorial_file.c source code.

Linking Against KeyView Filter SDK

The easiest way to get access to KeyView functionality is to link against the kvfilter shared library and place your executable in the same directory as the KeyView binaries (that is, the directory containing kvfilter.so or kvfilter.dll).

TIP: Loading shared libraries can expose your application to attacks. For advice on avoiding DLL preloading attacks, see Security Best Practices.

  • Linking using GCC

    On Linux, link against kvfilter.so, and also pass in the –rpath $ORIGIN option to the linker. For example:

     %KEYVIEW_HOME%/LINUX_X86_64/bin/kvfilter.so -Wl,-rpath,'$ORIGIN'

    NOTE: When you call it from inside a makefile, you might need to escape $ORIGIN to $$ORIGIN.

  • Linking using Clang

    On MacOS, link against kvfilter.so, and also pass in the –rpath @loader_path option to the linker. For example:

    %KEYVIEW_HOME%/MACOS_X86_64/bin/kvfilter.so -Wl,-rpath,@loader_path
  • Linking using Visual Studio

    On Windows, link against the import library for kvfilter.dll. This library is provided as part of the KeyView Filter SDK, under {platform}/lib/kvfilter.lib.

Loading the Filter Interface

Now that you can access KeyView functionality, you must include the required headers and load the interface functions from the kvfilter library, using KV_GetFilterInterfaceEx().

Copy
#include "kvtypes.h"
#include "kvfilt.h"

KVFltInterfaceEx filter;

KVErrorCode error = KV_GetFilterInterfaceEx(&filter, KVFLTINTERFACE_REVISION);

if(error != KVError_Success)
{
   //return error;
}

This call fills out the interface structure, which contains the function pointers for the rest of Filter's functionality.

Like most KeyView functions, KV_GetFilterInterfaceEx() returns a KVErrorCode. If this is KVError_Success, then the function succeeded. Otherwise, the error code indicates the problem that occurred. The rest of this tutorial assumes that you check the error code after each function call.

Now that you have successfully loaded the interface, you are ready to use the KeyView Filter SDK.

Creating a Filter Session

All KeyView functionality requires a session, which you must initialize at the start of processing, and shut down at the end of processing.

Copy
KVFilterSession session = NULL;

KVFilterInitOptions options;
KVStructInit(&options);
options.outputCharSet = KVCS_UTF8;

error = filter.fpInit(
   "/path/to/keyview/bin",
   YOUR_LICENSE_KEY,
   &options,
   &session);

//Use KeyView
 
filter.fpShutdown(session);

You initialize by using the function fpInit(), which you must provide with the path to the KeyView bin folder, and your license key.

This function also takes a pointer to a KVFilterInitOptions structure, which you must initialize by using KVStructInit(). This macro ensures that a struct is correctly set up for use with the KeyView interface, including version information for backwards compatibility. Any KeyView struct that contains a KVStructHead member must be initialized with the KVStructInit() macro.

TIP: Performance considerations:

  • Session Lifetime. You can process multiple files in a single session, which might improve performance by reducing costs associated with start-up and shutdown.

  • Multi-threading. To maximize throughput when processing multiple files, you can call KeyView from multiple threads. All KeyView functions are thread-safe when called in this manner. Each thread using KeyView must create its own session by calling fpInit(). You must not share filter sessions between threads.

TIP: Security considerations:

  • Privilege Reduction. By default, KeyView performs most of its operations out-of-process, creating a separate process to parse file data. This protects your main application from the effects of rare problems like memory leaks or crashes. You can include additional protection by running KeyView with reduced privileges. See Run KeyView with Reduced Privileges.

  • Temp Directory. While processing, KeyView might place sensitive data in the temporary directory. You might want to consider protecting the temporary directory. See Protect the Temporary Directory.

Now that you have set up the API, you can perform KeyView Filter functionality on documents.

Opening a Document

KeyView functions operate on a KVDocument object, which is a representation of a document that is agnostic as to where the document actually lives. You can create a KVDocument from a file on disk by using fpOpenDocumentFromFile. You must close the KVDocument after use.

Copy
KVDocument document = NULL;

error = filter.fpOpenDocumentFromFile(session, pathToInputFile, &document);

//Pass document to KeyView functions

filter.fpCloseDocument(document);

Filter a File

One of the most important features of KeyView is filtering text from a document. This section shows you how to get text from a KVDocument object, and output it to a file on disk.

Filtering Text

You can filter text to an output file by using the fpFilterToFile() function.

Copy
error = filter.fpFilterToFile(document, pathToOutputFile);

TIP: Partial Filtering. The fpFilterToFile() function filters the entire file in one go, but you might want to filter only part of the file, or filter the file in chunks. The advanced tutorial covers how to do partial filtering.

TIP: Mail Files. Mail files, such as EML or MSG, are considered a form of container, and you cannot filter them directly. This tutorial covers how to filter mail files later, in Extracting Subfiles.

Filtering Hidden Information

KeyView provides a number of options that control what text to output, and how to convert or display that text. A common requirement of KeyView is to display as much text as possible, including text that is not normally visible in the document, such as hidden cells or slides, or ancillary text like comments or notes. You can display this text by enabling the hidden text option in fpSetConfig().

Copy
error = filter.fpSetConfig(session, KVFLT_SHOWHIDDENTEXT, TRUE, NULL);

Detecting and Using File Format Information

KeyView enables you to reliably determine the file format of a huge range of documents. It does this by analyzing the internal structure and content of the file, rather than relying on file names or extensions. Detection prioritizes both accuracy and speed, only processing as much of the file as necessary to rule out false positives.

Detecting the File Format

File format detection functionality is exposed through the API function fpGetDocInfo().

Copy
ADDOCINFO adInfo;
error = filter.fpGetDocInfo(document, &adInfo);

TIP: Source Code Identification. KeyView can optionally detect source code, attempting to identify the programming language that it is written in. You can learn more in Source Code Identification.

Checking if a File is Supported

KeyView provides a convenience function, fpCanFilter() that performs detection and determines if you can pass the file to fpFilter().

Copy
error = filter.fpCanFilter(document);

If this function does not return KVError_Success, the error returned explains why KeyView cannot filter the format, for example, because KeyView could not determine the format, because the file does not exist, or because the format is not supported for filtering.

While this step is not strictly necessary, it can simplify many workflows.

Retrieving Metadata

File formats can contain a variety of different metadata, and KeyView makes it easy to access all of this information. KeyView retrieves metadata from various sources in a file, such as:

  • Format-specific standard metadata

  • User-provided custom metadata

  • Exif tags

  • XMP elements

  • MIP Labels

Getting the Metadata List

You can retrieve metadata elements by using fpGetMetadataList(). This function fills out the KVMetadataList structure, which you must free by using its fpFree function.

Copy
const KVMetadataList* metadataList = NULL;
error = filter.fpGetMetadataList(document, &metadataList);

//Iterate through metadata

metadataList->fpFree(metadataList);

Iterating Through the List

You can retrieve individual metadata elements by iterating through the metadata list using the fpGetNext() function in KVMetadataList, which fills out the KVMetadataElement structure. The information that this structure returns is valid only while the session is still alive, and becomes invalid after you call fpFree(). The end of the list is indicated by the retrieved element being NULL.

Copy
while(1)
{
   const KVMetadataElement* element = NULL;
   error = metadataList->fpGetNext(metadataList, &element);
 
   if(error != KVError_Success)
   {
      //Handle error
   }
 
   if(element == NULL)
   {
      break;
   }
 
   //Process metadata element
}

Interpreting a Metadata Element

Each metadata element is conceptually represented as a key-value pair, where pKey is the name of the metadata key, and pValue is the value of that piece of metadata. To know the type of the metadata object the pValue points to, you must first consult the eType member. Strings are output in the character set that you requested in the call to fpInit().

Copy
switch (element->eType)
{
case KVMetadataValue_Bool:
{
   BOOL value = *(BOOL*)element->pValue;
   //Process Bool value
   break;
}
case KVMetadataValue_Int64:
{
   int64_t value = *(int64_t*)element->pValue;
   //Process Int64 value
   break;
}
case KVMetadataValue_Double:
{
   double value = *(double*)element->pValue;
   //Process Doube value
   break;
}
case KVMetadataValue_WinFileTime:
{
   int64_t value = *(int64_t*)element->pValue;
   //Process WinFileTime value
   break;
}
case KVMetadataValue_String:
{
   KVString value = *(KVString*)element->pValue;
   //Process String value
   break;
}
case KVMetadataValue_Binary:
{
   KVBinaryData value = *(KVBinaryData*)element->pValue;
   //Process Binary value
   break;
}
case KVMetadataValue_MIPLabel:
{
   KVMIPLabel value = *(KVMIPLabel*)element->pValue;
   //Process MIPLabel value
   break;
}
default:
   //Handle unrecognised type
   break;
}

Standardized Metadata Elements

Different file formats can store the same piece of information in different ways. For example, one file format might call the width of the image width, another image_width, and another x_size. This behavior is often unhelpful, because you then need to maintain a list of fields that correspond to a particular piece of information. KeyView solves this problem by standardizing certain metadata fields. See Understanding Metadata Fields in KeyView.

Extracting Subfiles

KeyView Filter SDK allows you to access the subfiles of a document, from both pure containers (such as ZIP or TAR files) and from documents embedded inside other files (such as an Excel spreadsheet embedded in a Word document).

Loading the Extract Interface

You can load the Extract interface by including the kvxtract.h header, and calling the fpGetExtractInterface() function in the kvfilter shared library.

Copy
#include "kvxtract.h"

KVExtractInterfaceRec extract;

KVStructInit(&extract);

error = filter.fpGetExtractInterface(session, &extract);

The interface function takes the filter session you created earlier, as well as the extraction interface to fill out.

Opening a Container

You must open a container file before you can access the subfiles. You open the container by using the fpOpenFileFromFilterSession() function. This function creates a file-specific handle that you can use with the other functions in the extract interface. You must close this handle after use.

Copy
void* fileHandle = NULL;
KVOpenFileArgRec        openArg;

KVStructInit(&openArg);
openArg.extractDir = "path/to/extract/dir";
openArg.document = document;

error = extract.fpOpenFileFromFilterSession(session, &openArg, &fileHandle);
                
    //Use File Handle

extract.fpCloseFile(fileHandle);

You can then get information about the container itself by using the function fpGetMainFileInfo(). Most importantly, this tells you the number of subfiles. You must free this structure after use.

Copy
KVMainFileInfo          fileInfo = NULL;

error = extract.fpGetMainFileInfo(fileHandle, fileInfo);

//Use main file info

extract.fpFreeStruct(fileHandle, fileInfo);

Extracting Subfiles

Before you extract the subfile itself, you can first get some information about the subfile. You get this information by calling the fpGetSubFileInfo() function, using the index to identify the subfile. You must free this structure after use.

Copy
for(int ii = 0; ii < fileInfo->numSubFiles; ++ii)
{
   KVSubFileInfo subFileInfo = NULL;
   error = extract.fpGetSubFileInfo(fileHandle, ii, &subFileInfo);

   //Use sub file info
 
   extract.fpFreeStruct(fileHandle, subFileInfo);
}

After you have this subfile info, you can use it to construct the necessary arguments for extraction.

Copy
KVSubFileExtractInfo extractInfo = NULL;
KVExtractSubFileArgRec extractArg;
KVStructInit(&extractArg);

if(subFileInfo->subFileType == KVSubFileType_Folder ||
   subFileInfo->subFileType == KVSubFileType_External)
{
   goto skipfile;
}

extractArg.index = index;
extractArg.filePath = subFileInfo->subFileName;
extractArg.extractionFlag =
   KVExtractionFlag_CreateDir |
   KVExtractionFlag_Overwrite |
   KVExtractionFlag_GetFormattedBody |
   KVExtractionFlag_SanitizeAbsolutePaths;

error = extract.fpExtractSubFile(fileHandle, &extractArg, &extractInfo);

//Do more processing, such as filtering the sub file

extract.fpFreeStruct(fileHandle, extractInfo);

The fpExtractSubFile() function fills out the KVSubFileExtractInfo pointer, which tells you more about what the function actually did. For example, it tells you the location it extracted the file to.

TIP: Mail Files. KeyView treats mail files as containers, where the first subfile is the contents of the mail file, and subsequent subfiles are the attachments.

NOTE: Security. KVExtractionFlag_SanitizeAbsolutePaths mitigates against certain path traversal attacks. See Sanitize Absolute Paths.

By default, KeyView does not extract images when extracting subfiles. You can enable image extraction by using the fpSetConfig() function. This option is set globally for the session, so you can set it outside of the loop that you use to process files.

Copy
error = filter.fpSetConfig(session, KVFLT_EXTRACTIMAGES, TRUE, NULL);

Retrieving Mail Metadata

You can retrieve mail metadata for a particular subfile by using the function fpGetSubFileMetadataList(). This function fills out the same KVMetadataList structure that you used in Retrieving Metadata, and can be handled in the same way. You must initialize KVGetSubFileMetadataListArg by using KVStructInit().

Copy
const KVMetadataList* metadataList = NULL;

KVGetSubfileMetadataListArgRec metaArgs;
KVStructInit(&metaArgs);
metaArgs.index = index;
metaArgs.trgCharset = KVCS_UTF8;

error = extract.fpGetSubfileMetadataList(fileHandle, &metaArg, &metadataList);

//Process metadata using metadataList->fpGetNext()

metadataList->fpFree(metadataList);

Conclusion

You have now built a basic sample program, processing documents using file mode. To learn about more advanced features, such as processing files in stream mode, take a look at the advanced tutorial.