KeyView Filter SDK

23.2.0

New in this Release

  • KeyView format detection has been extended, with support for 56 additional file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Filter SDK Programming Guides.

  • KeyView 23.2 introduces a new metadata API. The new API:

    • reduces the number of function calls you need to make to retrieve all metadata.
    • performs field standardization. Field standardization returns metadata using a standard set of field names, so that the same metadata is returned in the same field regardless of the source file format. The new metadata API allows for the introduction of further standardization, in future releases, without breaking backwards compatibility.

    The new metadata API is available in KeyView Filter (C and C++). The new metadata API can also be used through KeyView Export (C), but only when extracting subfile metadata.

  • KeyView 23.2 introduces the concept of a KVDocument to the C API. The Filter C API has new functions that accept a KVDocument instead of a file path or input stream. This simplifies the API because a KVDocument stores information about the source, and a KVDocument created from a file or stream can be passed into the same function. For example, fpGetDocInfoFile() and fpGetDocInfoStream() can be replaced with the new function fpGetDocInfo(). The introduction of KVDocument is also intended to support future performance enhancements, because KeyView will be able to cache information about a document instead of repeating some work in cases where you call several functions that operate on the same input.

  • In the Filter C API, the fpFilterConfig() function has been replaced with the new fpSetConfig(). This function allows you to set the same configuration options, and it returns a KVErrorCode, rather than using extended errors. The fpFilterConfig() function is now deprecated.

  • In the Filter C API, KeyView now defines a KVFilterSession type which can be used instead of a void* for holding the session object provided by fpInit and passed to the other Filter API functions.

  • Error reporting has been simplified in the C API. In earlier versions of KeyView, some functions could return the error code KVERR_General. You could then call fpGetKvErrorCodeEx() to obtain an "extended" error code. In KeyView 23.2 the error codes have been unified such that all error codes are included in the KVErrorCodeenumeration. If a function returns an error code, there is no need to call a second function to obtain more information. This makes it easier to handle errors when an operation fails.

  • The fpGetXMPInfo* functions now return appropriate error codes on failure.

  • KeyView has been simplified so that it is much easier to map file formats to readers. File formats no longer have an associated "category". The KeyView configuration files such as formats.ini, formats_e.ini, and kvsdk.ini now identify file formats using the same file format numbers that are returned by format detection.

    For example, when KeyView detects an Adobe PDF file it returns format number 230. Imagine that you want to process PDF files with the reader pdf2sr.

    In previous versions of KeyView you had to find the associated format category (200) and use this to configure KeyView:

    200=pdf2

    In KeyView 23.2, this is no longer necessary and you instead use the same format number that is returned from format detection:

    230=pdf2
  • You can now remove certain third-party libraries that handle file formats you do not need to process. For a full list of these optional third-party components, and the changes you must make to exclude them, refer to the KeyView Filter SDK Programming Guide.

  • KeyView can now extract all platform-specific embedded files from PDF_Fmt documents.

  • KeyView can now filter WordPerfect Graphics (WordPerfect_Graphics_Fmt) files of adVECTORGRAPHIC class.

  • When you enable 'show hidden text', KeyView can now output author names for comments in Rich Text Format (MS_RTF_Fmt) documents.

  • When you enable 'show hidden text', KeyView can now output the value of the href attribute in HTML (HTML_Fmt) files.

  • KeyView can now process certain OpenOffice Text files that it would previously reject, and it can process image alt text in OpenOffice Text as hidden text. KeyView can also now filter text from the master slide in OpenOffice Presentations as hidden text.

  • When getting mail metadata from EML subfiles, KeyView now reports the sent date as a date instead of a string.

  • When getting subfile information from EML subfiles, KeyView now converts the file time to UTC, rather than an unspecified time zone.

  • For Microsoft Visio 2013 (.VSDX) files, KeyView now reports solution properties in the metadata.

  • KeyView now supports TIFF (TIFF_Fmt) files that use WebP compression.

  • Handling of Arabic diacritics (tashkil) has been significantly improved when using the pdfsr reader to process PDFs.

  • Text ordering has been improved when using the pdfsr reader to process PDFs.

  • In the C++ API, Session::subfiles now throws a keyview::password_protected_error if the container is protected and the session has not been configured with the correct password.

  • For the .NET API, the FilterTextDotNet sample program now includes a C# Project file FilterTestDotNet.csproj to make it easier to use.

  • In the C++ API, you can now enable tab delimiters for tables by using the tab_delimited and output_table_delimiters functions in the Configuration class.

  • The FreeType third-party library has been upgraded to version 2.12.1.

  • The ODA third-party library has been upgraded to version 2023.12.

  • The zlib third-party library has been upgraded to version 1.2.13.

  • The libxml2 third-party library has been upgraded to version 2.10.3.

  • The expat third-party library has been upgraded to version 2.5.0.

  • The ICU third-party library has been upgraded to version 72.1.

  • The openssl third-party library has been upgraded to version 3.0.8.

  • The libde265 third-party library has been upgraded to version 1.0.11.

  • The XMP-Toolkit third-party library has been upgraded to version 2022.06.

  • The wavpack third-party library has been upgraded to version 5.6.0.

  • The sqlite third-party library has been upgraded to version 3.41.0.

Resolved Issues

  • (Security update) The third-party libtiff library has been upgraded to version 4.5.0 to resolve known vulnerabilities, including CVE-2022-2056, CVE-2022-2057, CVE-2022-2058, CVE-2022-3452, CVE-2022-3570, CVE-2022-3597, CVE-2022-3598, CVE-2022-3599, CVE-2022-3626, and CVE-2022-3627.
  • (Security update) The third-party protobuf library has been upgraded to version 3.21.12 to resolve known vulnerabilities, including CVE-2022-1941.

  • (Security update) The libjpeg third-party library has been upgraded to version 9e to resolve potential vulnerabilities.
  • (Security update) The libwebp third-party library has been upgraded to version 1.3.0.

  • When running out-of-process in stream mode, KeyView used a lot of pipe operations when detecting the format of some files, which could negatively impact the performance of the system.
  • KeyView could pause for up to a minute while trying to shut down an out-of-process process.
  • For password protected OpenOffice files (ODS, ODT and ODP), fpOpenFile did not return KVERR_PasswordProtected. Continuing with extraction could then result in invalid extracted files.

  • When attempting to get summary info from some password protected Office formats (DOCX, PPTX, XLSX, ODS, ODT and ODP), KeyView could return KVERR_General instead of KVERR_PasswordProtected.

  • KeyView could truncate long sections of text in PDF_Fmt documents.

  • KeyView did not retrieve the Image Width, Image Height and Bits Per Pixel in summary information from Tagged Image File Format (TIFF) TIFF_Fmt files.

  • For some Microsft Excel (XLSX) files with a lot of cells using Rich Data Types, KeyView output the names of those types incorrectly, using a number instead of a type name.

  • KeyView could skip some user defined properties in summary information for some OLE-based files like MS_Project_2007_Fmt.

  • KeyView did not perform OCR on animated PNG (APNG_Fmt) images.

  • KeyView could fail to extract some images from Rich Text Format (MS_RTF_Fmt) documents.

  • KeyView did not extract all the images from some Rich Text Format (MS_RTF_Fmt) documents.

  • The extraction API fpGetSubFileInfo function did not correctly report the sizes of subfiles when they were larger than 2GB.

  • Some PDF files took longer to process in version 12.13.0 of the SDK than in version 12.12.0.

  • Heic and Heif format documents could not be processed on macOS.

  • When Extract Images was enabled, filtering certain Word documents could cause KeyView to exit unexpectedly (in-process), or return an error (out-of-process).

  • The fpGetMainFileInfo function did not respect the source code detection option when KeyView was running out-of-process.

  • KeyView could return an error (out-of-process), or exit unexpectedly (in-process) when processing some Microsoft Visio (.vsd) files.

  • KeyView missed text from some Microsoft Visio (.vsd) files.

  • Some base-64 encoded attachments to ICS files were extracted incorrectly.

  • KeyView could fail to extract bzip2 files if Unexpected Zip Detection was enabled.

  • KeyView could report duplicate metadata from Tagged Image File Format (TIFF) files with multiple pages.

  • KeyView could output incorrect metadata names for some PDF files.

  • KeyView could omit metadata entries for some PDF files.

  • When using the pdfsr reader to process PDFs that contained right-to-left (RTL) text, some text at the top of the file was not included in the output.

  • KeyView could process some CSV files incorrectly, meaning fields were output in the wrong columns.

Notes

KeyView 23.2 is a new major version of IDOL, released in the second quarter of 2023. It is the first new major version since KeyView 12.0 was released in June 2018. KeyView 23.2 includes some changes that require you to update your license and application code. For more information about how to upgrade, see the KeyView upgrade guide.

Deprecated Features

The following features are deprecated and might be removed in a future release.

Category Deprecated Feature Deprecated Since
C API

The following functions have been deprecated. OpenText recommends that you create a KVDocument to represent each document, by calling fpOpenDocumentFromFile() or fpOpenDocumentFromStream(). You can then use the new Filter API functions that accept a KVDocument.

  • fpCanFilterFile()
  • fpCanFilterStream()
  • fpCloseStream()
  • fpFileToInputStreamCreate()
  • fpFileToInputStreamFree()
  • fpFilterFile()
  • fpFilterStream()
  • fpGetDocInfoFile()
  • fpGetDocInfoStream()
  • fpGetRestrictionsFile()
  • fpGetRestrictionsStream()
  • fpOpenStream()
  • fpOpenStreamEx2()
23.2.0
C API

As part of the improvements to simplify error handling, the following functions have been deprecated:

  • fpFilterConfig(). OpenText recommends that you use the function fpSetConfig() instead. This sets the same configuration options, and returns a KVErrorCode rather than a Boolean value.
  • fpGetKvErrorCodeEx(). You only need to call this if you use the deprecated functions fpGetDocInfoFile, fpGetDocInfoStream, and fpFilterConfig which return FALSE to indicate an error, rather than returning an error code.
23.2.0
C API

The following functions have been deprecated. OpenText recommends that you access metadata through the new metadata API, by calling fpGetMetadataList().

  • fpFreeOLESummaryInfo()
  • fpFreeXmpInfo()
  • fpGetOLESummaryInfo()
  • fpGetOLESummaryInfoFile()
  • fpGetXmpInfo()
  • fpGetXmpInfoFile()
23.2.0
Readers

The following readers have been deprecated:

  • cebsr

  • lwpsr

23.2.0

Requirements

For information about supported platforms, supported compilers, and software dependencies for the KeyView Filter SDK, refer to the KeyView Filter SDK Programming Guides.

Documentation

The following documentation is available for KeyView Filter SDK version 23.2.0.

  • KeyView Filter SDK C Programming Guide

  • KeyView Filter SDK C++ Programming Guide

  • KeyView Filter SDK Java Programming Guide

  • KeyView Filter SDK .NET Programming Guide