KeyView Export SDK

24.1.0

New in this Release

  • KeyView format detection has been extended, with support for 40 additional file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Export SDK Programming Guides.

  • KeyView can extract embedded media files from PDF_Fmt documents.

  • Metadata parsing has been improved for PDF documents to handle some additional non-standard methods of storing metadata.

  • When processing a PDF file using the pdfsr reader, the annotations (also known as comments) output between each page of the document now include the title, author and modification date of the comment as a heading line.

  • KeyView now outputs annotations from PDF files when using the pdf2 reader. Annotations are listed on a separate page immediately following the page that the annotations are attached to. You can disable annotations by using the KVCFG_WP_NOCOMMENTS configuration flag.

  • KeyView now produces tabular output for TSV files.

  • KeyView output now includes URLs in OpenDocument Spreadsheet (.ODS) documents and outputs them as hidden text.

  • Text-only support has been added for the RSS syndication XML format (RSS_Fmt).

  • When configured to display formula strings, KeyView now supports many more formulas from Microsoft Excel (.xls) files.

  • When KeyView detects that it will output white text on a white background, it now outputs black text, so the information is still readable.

  • The third-party libpng library was upgraded from version 1.6.37 to version 1.6.40.

  • The third-party Lib ICU library was upgraded to version 73.2.

  • The third-party zlib library was upgraded from version 1.2.13 to version 1.3.

Resolved Issues

  • (Security update) The third-party OpenSSL library was updated to version 3.1.4.
  • (Security update) A potential security vulnerability was resolved for processing WebP_Fmt files or TIFF_Fmt files that use webp compression in HTML Export and XML Export. This change addresses CVE-2023-4863.

  • (Security update) The assr, awsr, kpcgmrdr, and orcsr readers had potential buffer overruns.

  • For some PDF files with particular font encodings, the PDF reader (pdfsr) would output some special characters incorrectly.

  • In HTML Export, when outputting vector graphics to SVG, text and raster images from EMF files were vertically flipped.

  • KeyView failed to detect spaces between words in some PDF documents.

  • Processing some valid ODS files failed with the error KVError_ArchiveFatalError.

  • For some Microsoft Word (DOCX) files, some left-to-right text was output as right-to-left.

  • In Microsoft Word (DOCX) files, hyperlinks in the Table of Contents had hyperlink styling applied to them. In the Microsoft Word native application, Word does not apply this styling in Print View.

  • KeyView did not use consistent naming conventions for mail subfiles across different file formats.

  • On non-windows platforms, KeyView reported invalid bytes at the end of subfiles names for some ZIP-based formats, such as JAR files.

  • MBox (.mbx) files that were signed but not encrypted returned KVError_PasswordProtected.

  • Attempting to retrieve metadata from MSG files returned a KVError_General, rather than KVError_FormatNotSupported.

  • KeyView could identify some files of type GDSII as MacWrite format.

23.4.0

New in this Release

  • KeyView format detection has been extended, with support for 48 additional file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Export SDK Programming Guides.

  • Handling of XLSX file has been improved when the same image data is referenced multiple times in the same input file. The extract images functionality now detects multiple references and outputs each image only once.

  • KeyView now supports extraction of subfiles from iWork13 Pages, Numbers and Keynote files.

  • KeyView now supports metadata extraction for current (2013 to the present day) Apple iWork Pages, Numbers and Keynote files.

  • Text-only Export support has been added for the ZIP variant of Uniform Office Format word processor files (Uniform_Office_Text_Zip_Fmt).

  • When converting PDF files to HTML with the pdfsr reader, Unicode private use area characters are now output as-is, rather than being translated to the configured replacement character (usually a question mark). This change means that symbols from symbolic fonts like Wingdings can now often be displayed correctly by the browser.

  • KeyView can now perform text-only export for Yozosoft Yozo Office (Yozo_Office_Fmt) files.

  • The kvhtmlexport sample program is now available on MacOS (x86 64 and M1).

  • The PDF reader (pdfsr) has been improved for Arabic text in some circumstances.

    When you use Microsoft Print to PDF to convert Word documents that contain Arabic text in certain fonts to PDF, the resulting file is often incomplete, missing character mapping information that is required to interpret the text content. The pdfsr reader was previously able to reconstruct this missing information for most cases where the text was in Calibri font. The pdfsr reader can now also reconstruct the information for Sakkal Majalla font, and for additional Calibri cases.

    Furthermore, pdfsr now attempts to reconstruct the information when a character is mapped to the Unicode replacement character codepoint, rather than only when a character's mapping information is missing entirely.

  • KeyView can now extract text from some PDF documents that contain Type 3 fonts without unicode character mapping information, rather than always converting all characters in those fonts to the configured replacement character. This process is not guaranteed to result in readable output for all such files.

  • The sample programs now default to SVG output for vector graphics.

  • The third-party openssl library has been upgraded to version 3.1.2.

  • (Security update) The third-party libwebp library has been upgraded to version 1.3.1.

  • (Security update) The third-party libxml2 library has been upgraded to version 2.11.4.

  • (Security update) The third-party freetype library has been upgraded to version 2.13.1.

  • The third-party libheif library has been upgraded to version 1.16.2.

Resolved Issues

  • On macOS, KeyView included a version of libxml2.dylib, which could make it difficult for calling applications to use their own or the system version of libxml2.dylib.

  • KeyView did not set the KVMainFileInfoFlag_HasContent flag for pFiles (RMS_Protected_Fmt).

  • When extracting subfiles from a PDF with extracted images configured to true, KeyView could output duplicate images.

  • KeyView could incorrectly process some images in PDF files resulting in distorted images.

  • KeyView sometimes failed to output the text from some cells in tables in current (2013 to the present day) Apple iWork Numbers and KeyNote files.

  • When PDF Logical Reading Order was enabled using the format_e.ini file, processing PDFs would result in a KVError_ReaderInitError.

  • KeyView could exit unexpectedly (in-process) or return an error (out-of-process) when extracting from certain MIME email files.

  • KeyView could exit unexpectedly (in-process) or return an error (out-of-process) when processing certain PDF and ODS files.

  • (Security update) The third-party ODA library was upgraded to 2024.7 on Windows and Linux platforms, and 2024.1 on Mac platforms to resolve known vulnerability CVE-2023-26495.

  • KeyView leaked memory when using the text member of Document objects in the C++ and Python APIs.

  • KeyView returned an error when retrieving headers and footers from certain MS Word (docx) documents.

  • KeyView returned an error when processing PDF documents containing encrypted forms.

  • KeyView returned an error when processing certain PDF documents.

  • KeyView sometimes included duplicate text in the output when PDF documents contained the same text multiple times in the same location.

  • KeyView sometimes exited unexpectedly while shutting down the kpodardr library when using the Export SDK on some AutoDesk AutoCAD Drawing file (AutoDesk_DWG_Fmt) documents.

23.3.0

New in this Release

  • KeyView format detection has been extended, with support for 61 additional file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Export SDK Programming Guides.

  • KeyView 23.3 expands support for the new metadata API that was introduced in KeyView 23.2. The new API has now been exposed through the Java API for Filter. You can also use the new API in the Java API for KeyView Export, but only when extracting subfile metadata.
  • Source code identification is available on all KeyView supported platforms.
  • The third-party sqlite library was upgraded to version 3.42.0.

Resolved Issues

  • When converting PDFs with a hidden text layer, such as those created by Media Server, invisible text was output in such a way that the transparency could not be overridden. As a result the text could not be highlighted by IDOL View Server.
  • (Security update) KeyView depended on outdated LZMA code for detection of 7z encryption. This dependence has been removed.

  • When using the pdfsr reader to process PDFs that contained fonts with certain predefined encodings, spaces sometimes appeared in the wrong places in output text.

  • When using the pdfsr reader to process PDFs, output text was sometimes broken onto additional new lines or missed characters.

  • KeyView leaked memory when filtering some Apple iWork Keynote (.key) (IWPG_Fmt) files.

  • KeyView incorrectly identified some attachments to Outlook .msg files as being inline pictures rather than non-inline attachments.

  • In HTML export, when exporting PDFs with both HiFi and OCR enabled, invisible text in the output document was not correctly aligned with the image text.

  • Changing the global locale could cause KeyView to fail to process some file formats.

  • When attempting to filter NIST_ITL_Fmt files, KeyView returned KVERR_FormatNotSupported.

  • Extracting metadata or summary information from invalid OLE-based files could cause memory inconsistencies.

  • KeyView could produce an unreadable subfile from PST files if no target character set was specified.

  • When processing HTML files that specify an ISO-2022 character encoding, KeyView output some entities as unprocessed strings, rather than the correct unicode character.

23.2.0

New in this Release

  • KeyView format detection has been extended, with support for 56 additional file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Export SDK Programming Guides.

  • KeyView 23.2 introduces a new metadata API. The new API:

    • reduces the number of function calls you need to make to retrieve all metadata.
    • performs field standardization. Field standardization returns metadata using a standard set of field names, so that the same metadata is returned in the same field regardless of the source file format. The new metadata API allows for the introduction of further standardization, in future releases, without breaking backwards compatibility.

    The new metadata API is available in KeyView Filter (C and C++). The new metadata API can also be used through KeyView Export (C), but only when extracting subfile metadata.

  • Error reporting has been simplified in the C API. In earlier versions of KeyView, some functions could return the error code KVERR_General. You could then call fpGetKvErrorCodeEx() to obtain an "extended" error code. In KeyView 23.2 the error codes have been unified such that all error codes are included in the KVErrorCodeenumeration. If a function returns an error code, there is no need to call a second function to obtain more information. This makes it easier to handle errors when an operation fails.

  • KeyView has been simplified so that it is much easier to map file formats to readers. File formats no longer have an associated "category". The KeyView configuration files such as formats.ini, formats_e.ini, and kvsdk.ini now identify file formats using the same file format numbers that are returned by format detection.

    For example, when KeyView detects an Adobe PDF file it returns format number 230. Imagine that you want to process PDF files with the reader pdf2sr.

    In previous versions of KeyView you had to find the associated format category (200) and use this to configure KeyView:

    200=pdf2

    In KeyView 23.2, this is no longer necessary and you instead use the same format number that is returned from format detection:

    230=pdf2
  • KeyView can now extract all platform-specific embedded files from PDF_Fmt documents.

  • When you enable 'show hidden text', KeyView can now output author names for comments in Rich Text Format (MS_RTF_Fmt) documents.

  • KeyView can now process certain OpenOffice Text files that it would previously reject, and it can process image alt text in OpenOffice Text as hidden text.

  • When getting mail metadata from EML subfiles, KeyView now reports the sent date as a date instead of a string.

  • When getting subfile information from EML subfiles, KeyView now converts the file time to UTC, rather than an unspecified time zone.

  • KeyView now supports TIFF (TIFF_Fmt) files that use WebP compression.

  • Handling of Arabic diacritics (tashkil) has been significantly improved when using the pdfsr reader to process PDFs.

  • Text ordering has been improved when using the pdfsr reader to process PDFs.

  • The FreeType third-party library has been upgraded to version 2.12.1.

  • The ODA third-party library has been upgraded to version 2023.12.

  • The zlib third-party library has been upgraded to version 1.2.13.

  • The libxml2 third-party library has been upgraded to version 2.10.3.

  • The expat third-party library has been upgraded to version 2.5.0.

  • The ICU third-party library has been upgraded to version 72.1.

  • The openssl third-party library has been upgraded to version 3.0.8.

  • The libde265 third-party library has been upgraded to version 1.0.11.

  • The XMP-Toolkit third-party library has been upgraded to version 2022.06.

  • The wavpack third-party library has been upgraded to version 5.6.0.

  • The sqlite third-party library has been upgraded to version 3.41.0.

  • The third-party libical library was upgraded to version 3.0.16.

  • The third-party Apache Arrow library was upgraded to version 11.0.0. This change includes upgrades to the following dependencies: 

    • boost was upgraded to version 1.75.0

    • brotli was upgraded to version1.0.9

    • jemalloc was upgraded to version 5.3.0

    • re2 was upgraded to version 2022-06-01

    • thrift was upgraded to version 0.16.0

    • utf8proc was upgraded to version 2.7.0

    • zStandard was upgraded to version 1.5.2

Resolved Issues

  • (Security update) The third-party libtiff library has been upgraded to version 4.5.0 to resolve known vulnerabilities, including CVE-2022-2056, CVE-2022-2057, CVE-2022-2058, CVE-2022-3452, CVE-2022-3570, CVE-2022-3597, CVE-2022-3598, CVE-2022-3599, CVE-2022-3626, and CVE-2022-3627.
  • (Security update) The third-party protobuf library has been upgraded to version 3.21.12 to resolve known vulnerabilities, including CVE-2022-1941.

  • (Security update) The libjpeg third-party library has been upgraded to version 9e to resolve potential vulnerabilities.
  • (Security update) The libwebp third-party library has been upgraded to version 1.3.0.

  • For password protected OpenOffice files (ODS, ODT and ODP), fpOpenFile did not return KVERR_PasswordProtected. Continuing with extraction could then result in invalid extracted files.

  • KeyView could truncate long sections of text in PDF_Fmt documents.

  • KeyView did not retrieve the Image Width, Image Height and Bits Per Pixel in summary information from Tagged Image File Format (TIFF) TIFF_Fmt files.

  • For some Microsft Excel (XLSX) files with a lot of cells using Rich Data Types, KeyView output the names of those types incorrectly, using a number instead of a type name.

  • Some C sample programs could loop endlessly when a bad argument was passed in.

  • KeyView could skip some user defined properties in summary information for some OLE-based files like MS_Project_2007_Fmt.

  • When using the pdfsr reader for text in right-to-left languages, diacritic characters were sometimes not extracted correctly.

  • KeyView did not extract all the images from some Rich Text Format (MS_RTF_Fmt) documents.

  • The extraction API fpGetSubFileInfo function did not correctly report the sizes of subfiles when they were larger than 2GB.

  • Some PDF files took longer to process in version 12.13.0 of the SDK than in version 12.12.0.

  • Heic and Heif format documents could not be processed on macOS.

  • KeyView could return an error (out-of-process), or exit unexpectedly (in-process) when processing some Microsoft Visio (.vsd) files.

  • KeyView missed text from some Microsoft Visio (.vsd) files.

  • Some base-64 encoded attachments to ICS files were extracted incorrectly.

  • When converting spreadsheet files to HTML, KeyView removed all empty rows even when bRemoveEmptyRows was set to FALSE.

  • KeyView could report duplicate metadata from Tagged Image File Format (TIFF) files with multiple pages.

  • When using kvhtmlexport to export container files, internal container subfile pages were misnamed as subfilen.temp.

  • KeyView could output incorrect metadata names for some PDF files.

  • KeyView could omit metadata entries for some PDF files.

  • KeyView could be slow to start the out-of-process session if called on more threads than there were ports configured.

  • When using the pdfsr reader to process PDFs that contained right-to-left (RTL) text, some text at the top of the file was not included in the output.

  • KeyView could process some CSV files incorrectly, meaning fields were output in the wrong columns.

  • KeyView would terminate unexpectedly (in-process), or return an error (out-of-process), when processing a PDF document with the reader kpPDF2rdr, if the input was an input stream that was not created by KeyView.
  • The reader kpPDF2rdr was not thread-safe.
  • The reader kpPDF2rdr failed to release memory each time a file was processed.
  • The reader kpPDF2rdr could output incorrect values for page width and height.

Notes

KeyView 23.2 is a new major version of IDOL, released in the second quarter of 2023. It is the first new major version since KeyView 12.0 was released in June 2018. KeyView 23.2 includes some changes that require you to update your license and application code. For more information about how to upgrade, see the KeyView upgrade guide.

Deprecated Features

The following features are deprecated and might be removed in a future release.

Category Deprecated Feature Deprecated Since
Readers

The following readers have been deprecated:

  • cebsr

  • lwpsr

23.2.0