KeyView Filter SDK

24.2.0

New in this Release

  • KeyView format detection has been extended, with support for 51 additional file formats. KeyView can now identify more than 2000 unique file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Filter SDK Programming Guides.

  • When running out-of-process, KeyView now communicates with the out-of-process server using shared memory, which results in faster processing times for nearly all operations. This feature is enabled by default.

    This feature provides the benefits of pipe-streaming, without the downside of taking longer to process some files. For example:

    • your application can be more responsive because each block of output is available before the next has been produced.

    • it reduces the disk space used for temporary files.

    • it reduces the file I/O for partial filtering because KeyView might not need to read the whole file if you choose to stop filtering before all the text has returned.

    • for many formats it reduces the amount of the input file that is read during extraction, especially if you extract only a subset of the files.

    NOTE: This change also resolves an issue where KeyView set the detected format of a subfile when extracting to a stream out-of-process, contrary to the documented behavior.

    When you extract a subfile, the intended behavior is that the result of extraction (KVSubFileExtractInfo (C API), ExtSubFileExtractInfo (Java API), or ExtractSubFileExtractInfo (.NET API)) provides the detected format of the subfile if you extract to a file, but not if you extract to a stream. In version 24.1 and earlier, with the default configuration, KeyView was doing additional work to set the detected format of the subfile when running out-of-process.

    With version 24.2.0, the benefits of extracting to a stream are available even when running out-of-process, and as a result, the detected format is no longer provided.

    If you were relying on this undocumented behavior, you must update your code to either extract to a file, or detect the format of the file after extraction by using fpGetDocInfo().

  • Performance has been improved for large Rich-Text Format (RTF) files.

  • Performance has been improved for Source Code Detection.

  • Keyview can now filter tabular data from OneNote files in a way that allows Eduction to make use of the table structure.

  • KeyView can now filter tabular data from Pipe_Separated_Fmt files in a way that allows Eduction to make use of the table structure.

  • The Python API now exposes the header_and_footer configuration option.

  • Optical Character Recognition (OCR) is now available on MacOS x86-64 platforms.

  • The third-party libxml2 library has been updated to version 2.12.0.

  • The third-party zStandard library has been upgraded to version 1.5.5.

Resolved Issues

  • (Security update)The third-party openssl library has been upgraded to version 3.2.1 to resolve security potential vulnerabilities including CVE-2023-4807,CVE-2023-5363, CVE-2023-5678, CVE-2023-6237, CVE-2024-0727.

  • (Security update) The third-party sqlite library has been upgraded to version 3.45.1 to resolve known vulnerabilities, including CVE-2023-7104.

  • (Security update) The third-party libde265 library has been upgraded to version 1.0.15 to resolve known vulnerabilities, including CVE-2023-27102, CVE-2023-27103, CVE-49465, CVE-49467, and CVE-49468.

  • (Security update) The third-party libheif library has been upgraded to version 1.17.6 to resolve known vulnerabilities, including CVE-2023-49462, and CVE-2023-49463.

  • (Security update) The third-party XMPToolkit library has been upgraded to version 2023.12 to resolve known vulnerabilities.

  • (Security update) The third-party expat library has been upgraded to version 2.6.0 to resolve known vulnerabilities, including CVE-2023-52425 and CVE-2023-52426.

  • When attempting to extract images using the pdf2sr reader, Keyview sometimes returned an error (out-of-process), or exited unexpectedly (in-process).

  • When processing Microsoft Word (docx) files that contained charts, KeyView sometimes left files in the temporary directory when running in-process.

  • KeyView sometimes output some nonsense text when processing the headers and footers of Microsoft Word documents (.doc).

  • KeyView sometimes returned KVError_Success when attempting to process unsupported PSD documents. It now returns KVError_FormatNotSupported.

  • When using the tstextract.exe program with the pstx reader, KeyView sometimes failed to extract any content from a PST file when one of the message-type attachments was not valid.

  • When a PDF had an non-valid subform, it could lead to a bad input stream error for the whole file. KeyView now skips the non-valid subform and attempts to retrieve information from the rest of the file.

  • For some fields output in KeyView versions earlier than 23.2, KeyView did not output a non-standard metadata element. To aid backwards compatibility, KeyView now outputs a non-standardized field for each field that was present before standardization was introduced in 23.2.

  • For some mail formats, some metadata elements were output twice.

  • KeyView refreshed the out-of-process process for Extraction even if kvoopRefresh was set to zero.

  • When filtering formulae from some non-valid spreadsheet files, KeyView could return an error (out-of-process), or exit unexpectedly (in-process).

  • When processing some RSS feed XML files to filter hidden text, KeyView could time out (out-of-process), or stop responding (in-process).

  • When using the C++ API, the header_and_footer configuration option did not work correctly when it was set after initializing a session.

Notes

  • The KeyView macOS support has changed so that KeyView 24.2 supports macOS (x86-64) version 10.15 or later.

  • KeyView 24.2 no longer includes the deprecated cebsr and lwpsr readers.

24.1.0

New in this Release

  • KeyView format detection has been extended, with support for 40 additional file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Filter SDK Programming Guides.

  • KeyView can extract embedded media files from PDF_Fmt documents.

  • When filtering a PDF file using the pdfsr reader with hidden text shown, KeyView now outputs the labels for the outline bookmarks.

  • Metadata parsing has been improved for PDF documents to handle some additional non-standard methods of storing metadata.

  • When processing a PDF file using the pdfsr reader, the annotations (also known as comments) output between each page of the document now include the title, author and modification date of the comment as a heading line.

  • KeyView now outputs annotations from PDF files when using the pdf2 reader. You can disable this option by using the KVFLT_NOCOMMENTS configuration flag.

  • KeyView can now extract text from the pivot table cache in Microsoft Excel XLS and XLSX files. The pivot table cache might contain data that was deleted from the worksheet. To output this information, enable hidden text for Microsoft Excel documents.

  • KeyView now gets text from SmartArt when filtering PPT (PowerPoint 97-2004) files.

  • When you enable extract images, KeyView now extracts images that are embedded in HTML files using base64 encoding.

  • KeyView now treats TSV files as spreadsheet files, resulting in tab delimited output that can be parsed by Eduction (when you have enabled tab delimited output and table delimiters).

  • KeyView output now includes URLs in OpenDocument Spreadsheet (.ODS) documents and outputs them as hidden text.

  • KeyView can filter the RSS syndication XML format (RSS_Fmt).

  • When configured to display formula strings, KeyView now supports many more formulas from Microsoft Excel (.xls) files.

  • KeyView supports OCR on the Linux ARM and MacOS M1 platforms.

  • The C++ and Python APIs now expose options for configuring OCR.

  • Error reporting has been improved when KeyView is reading from the text member of a Document object and the child process times out. It now raises a child_timeout_error (in the C++ API) or ChildTimeoutError (in the Python API).

  • In the C API, you can now configure the out-of-process error log after session initialization by using fpSetConfig.

  • In the Python API, you can now enable the out-of-process error log.

  • When KeyView cannot start the out-of-process executable, kvoop.exe, it returns the error code KVError_CreateProcessFailed instead of KVError_General.

  • The third-party libpng library was upgraded from version 1.6.37 to version 1.6.40.

  • The third-party Lib ICU library was upgraded to version 73.2.

  • The third-party zlib library was upgraded from version 1.2.13 to version 1.3.

Resolved Issues

  • (Security update) The third-party OpenSSL library was updated to version 3.1.4.
  • (Security update) A potential security vulnerability was resolved for processing WebP_Fmt files or TIFF_Fmt files that use webp compression in the Filter SDK. This change addresses CVE-2023-4863.

  • (Security update) The assr, awsr, kpcgmrdr, and orcsr readers had potential buffer overruns.

  • The Java API could leak memory when filtering from an input stream.
  • The sample program for the C++ Extract API could return a KVError_BadInputStream error when extracting a password-protected subfile from a 7Z without the correct password, rather than a KVError_PasswordProtected error.

  • KeyView did not filter URLs that were associated with embedded images in Microsoft Excel (XLSX) documents. These URLs are now output as hidden text.

  • In the C++ API, the configuration option out_of_process_log did not work after session initialization.

  • For some PDF files with particular font encodings, the PDF reader (pdfsr) would output some special characters incorrectly.

  • In the .NET API, subfile extraction failed with KVError_InvalidArgsExtraction when extracting to a file rather than a stream.

  • KeyView failed to detect spaces between words in some PDF documents.

  • Processing some valid ODS files failed with the error KVError_ArchiveFatalError.

  • The KeyView sample program tstxtract did not generate sufficient subdirectories when doing recursive extraction.

  • KeyView did not use consistent naming conventions for mail subfiles across different file formats.

  • On non-windows platforms, KeyView reported invalid bytes at the end of subfiles names for some ZIP-based formats, such as JAR files.

  • MBox (.mbx) files that were signed but not encrypted returned KVError_PasswordProtected.

  • Attempting to retrieve metadata from MSG files returned a KVError_General, rather than KVError_FormatNotSupported.

  • KeyView could identify some files of type GDSII as MacWrite format.

Notes

  • The Filter .NET API requires the Microsoft .NET Framework version 4.6.2 or later.

  • On the Linux x86-64 platform, KeyView 24.1 requires libstdc++.so.6 and libgcc_s.so.1 from at least GCC 12.2 if you are using OCR or RMS decryption.

23.4.0

New in this Release

  • KeyView format detection has been extended, with support for 48 additional file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Filter SDK Programming Guides.

  • Handling of XLSX file has been improved when the same image data is referenced multiple times in the same input file. The extract images functionality now detects multiple references and outputs each image only once.

  • KeyView now supports extraction of subfiles from iWork13 Pages, Numbers and Keynote files.

  • KeyView now supports metadata extraction for current (2013 to the present day) Apple iWork Pages, Numbers and Keynote files.

  • Filter support has been added for the ZIP variant of Uniform Office Format word processor files (Uniform_Office_Text_Zip_Fmt).

  • When filtering PDF files with the pdfsr reader, Unicode private use area characters are now output as-is, rather than being translated to the configured replacement character (usually a question mark).

  • KeyView can now filter Yozosoft Yozo Office (Yozo_Office_Fmt) files.

  • The Python API is now available for MacOS (x86 64 and M1).

  • The readability of metadata output from the filtertest sample program has been improved.

  • The PDF reader (pdfsr) has been improved for Arabic text in some circumstances.

    When you use Microsoft Print to PDF to convert Word documents that contain Arabic text in certain fonts to PDF, the resulting file is often incomplete, missing character mapping information that is required to interpret the text content. The pdfsr reader was previously able to reconstruct this missing information for most cases where the text was in Calibri font. The pdfsr reader can now also reconstruct the information for Sakkal Majalla font, and for additional Calibri cases.

    Furthermore, pdfsr now attempts to reconstruct the information when a character is mapped to the Unicode replacement character codepoint, rather than only when a character's mapping information is missing entirely.

  • KeyView can now extract text from some PDF documents that contain Type 3 fonts without unicode character mapping information, rather than always converting all characters in those fonts to the configured replacement character. This process is not guaranteed to result in readable output for all such files.

  • The Python API now raises an ExternalSubfileError when you attempt to extract an external subfile.

  • In the Python API, the Document.filter and Subfile.extract methods have been added. The existing Document.filter_to_file and Subfile.extract_to_file methods are now deprecated aliases of these methods.

  • In the Python API, enum value names now match those in the C++ API.

  • The third-party openssl library has been upgraded to version 3.1.2.

  • (Security update) The third-party libwebp library has been upgraded to version 1.3.1.

  • (Security update) The third-party libxml2 library has been upgraded to version 2.11.4.

  • (Security update) The third-party freetype library has been upgraded to version 2.13.1.

  • The third-party libheif library has been upgraded to version 1.16.2.

Resolved Issues

  • The KeyView sample program tstxtract could not extract subfiles with names that include certain characters.

  • On macOS, KeyView included a version of libxml2.dylib, which could make it difficult for calling applications to use their own or the system version of libxml2.dylib.

  • KeyView did not set the KVMainFileInfoFlag_HasContent flag for pFiles (RMS_Protected_Fmt).

  • When extracting subfiles from a PDF with extracted images configured to true, KeyView could output duplicate images.

  • KeyView could incorrectly process some images in PDF files resulting in distorted images.

  • KeyView sometimes failed to output the text from some cells in tables in current (2013 to the present day) Apple iWork Numbers and KeyNote files.

  • KeyView did not correctly enclose footer text from Microsoft PowerPoint Windows XML 2007 and 2010 (.pptx) files in custom footer tags when LogicalOrder was set to 1.

  • When PDF Logical Reading Order was enabled using the format.ini file, processing PDFs would result in a KVError_ReaderInitError.

  • KeyView could exit unexpectedly (in-process) or return an error (out-of-process) when extracting from certain MIME email files.

  • KeyView could exit unexpectedly (in-process) or return an error (out-of-process) when processing certain PDF and ODS files.

  • When running KeyView out-of-process, processing streams in pipe mode, kvoop memory usage would increase until the process is shut down.

  • When filtering PDFs with pdfsr, with logical ordering turned on, hyphens following a number at the end of a line were deleted. Deleting the hyphens is correct behavior when words are split over line endings, but not desirable for numbers in which hyphens tend to be deliberate (for example in social security numbers).

  • When outputting URLs from certain ODT files, the URL was sometimes concatenated to the preceding text.

  • (Security update) The third-party ODA library was upgraded to 2024.7 on Windows and Linux platforms, and 2024.1 on Mac platforms to resolve known vulnerability CVE-2023-26495.

  • KeyView leaked memory when using the text member of Document objects in the C++ and Python APIs.

  • KeyView returned an error when retrieving headers and footers from certain MS Word (docx) documents.

  • KeyView returned an error when processing PDF documents containing encrypted forms.

  • KeyView returned an error when processing certain PDF documents.

  • KeyView sometimes included duplicate text in the output when PDF documents contained the same text multiple times in the same location.

  • In the Python API, the Subfile.time member gave a valid but incorrect value if the document does not store a timestamp for that subfile. It now gives a value of None in this case.

23.3.0

New in this Release

  • KeyView format detection has been extended, with support for 61 additional file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Filter SDK Programming Guides.

  • KeyView 23.3 introduces a Python API for the Filter SDK. The Python API is currently in beta. It is not recommended for production use, and might not be released in this way in the future.

  • KeyView 23.3 introduces a Document class to the C++ API. OpenText recommends that you create a Document object for each file that you want to process. You can obtain a document object by calling the new method open() on your KeyView Session. The use of document objects allows caching of data between calls to KeyView for the same file, and might allow performance improvements in future versions of KeyView.
  • The KeyView Filter C API is supported on Windows 11 ARM64.
  • You can now enable the output of Eduction table delimiters by using the Java API. You can configure this option by using the new Filter.setOutputTableDelimiters method.

  • KeyView 23.3 expands support for the new metadata API that was introduced in KeyView 23.2. The new API has now been exposed through the Java API for Filter. You can also use the new API in the Java API for KeyView Export, but only when extracting subfile metadata.
  • Source code identification is available on all KeyView supported platforms.
  • KeyView can extract XMP metadata from additional formats on the LINUX_ARM platform. This change means that support for XMP metadata is the same across all supported platforms.
  • The third-party sqlite library was upgraded to version 3.42.0.

Resolved Issues

  • (Security update) KeyView depended on outdated LZMA code for detection of 7z encryption. This dependence has been removed.

  • When running out of process on Windows platforms, kvoop.exe did not close the file handles it had inherited from the parent process.

  • When using the pdfsr reader to process PDFs that contained fonts with certain predefined encodings, spaces sometimes appeared in the wrong places in output text.

  • When using the pdfsr reader to process PDFs, output text was sometimes broken onto additional new lines or missed characters.

  • Attempting to get XMP metadata from supported file types with no associated reader failed with a KVError_FormatNotSupported error.

  • KeyView leaked memory when filtering some Apple iWork Keynote (.key) (IWPG_Fmt) files.

  • KeyView used large amounts of memory when filtering large CSV files.

  • KeyView incorrectly identified some attachments to Outlook .msg files as being inline pictures rather than non-inline attachments.

  • Changing the global locale could cause KeyView to fail to process some file formats.

  • When attempting to filter NIST_ITL_Fmt files, KeyView returned KVERR_FormatNotSupported.

  • Extracting metadata or summary information from invalid OLE-based files could cause memory inconsistencies.

  • When filtering files in stream mode using the pipe-streaming method, KeyView leaked memory in the worker process kvoop.exe.

  • KeyView could produce an unreadable subfile from PST files if no target character set was specified.

  • When processing HTML files that specify an ISO-2022 character encoding, KeyView output some entities as unprocessed strings, rather than the correct unicode character.

Notes

  • In the C++ metadata API, the Metadatum class has been renamed MetadataElement. The following methods of that class have also been renamed:

    • The method is_standard_field() has been renamed is_standard().
    • The method name() has been renamed key().
    • The method standard_field_type() has been renamed standard_key().

    OpenText recommends that you use the new class and method names because the old names are deprecated and might be removed in future.

  • In the C++ metadata API, functions that provide access to metadata return a new Metadata class. The same functions in KeyView 23.2 returned a std::multimap<std::string, Metadatum>. If your code stores metadata in a variable declared using auto, it will compile and run without any changes, provided that you do not call any std::multimap functions that are not implemented in Metadata. If your code stores metadata in a variable explicitly declared std::multimap, it will compile and run without any changes, but OpenText recommends that you use the new class instead, because conversion to std::multimap is deprecated and might be removed in future.

23.2.0

New in this Release

  • KeyView format detection has been extended, with support for 56 additional file formats. By identifying a larger range of formats present in the enterprise, decisions can be made on how to route, filter, or alert on such documents. For the full list, refer to the KeyView Filter SDK Programming Guides.

  • KeyView 23.2 introduces a new metadata API. The new API:

    • reduces the number of function calls you need to make to retrieve all metadata.
    • performs field standardization. Field standardization returns metadata using a standard set of field names, so that the same metadata is returned in the same field regardless of the source file format. The new metadata API allows for the introduction of further standardization, in future releases, without breaking backwards compatibility.

    The new metadata API is available in KeyView Filter (C and C++). The new metadata API can also be used through KeyView Export (C), but only when extracting subfile metadata.

  • KeyView 23.2 introduces the concept of a KVDocument to the C API. The Filter C API has new functions that accept a KVDocument instead of a file path or input stream. This simplifies the API because a KVDocument stores information about the source, and a KVDocument created from a file or stream can be passed into the same function. For example, fpGetDocInfoFile() and fpGetDocInfoStream() can be replaced with the new function fpGetDocInfo(). The introduction of KVDocument is also intended to support future performance enhancements, because KeyView will be able to cache information about a document instead of repeating some work in cases where you call several functions that operate on the same input.

  • In the Filter C API, the fpFilterConfig() function has been replaced with the new fpSetConfig(). This function allows you to set the same configuration options, and it returns a KVErrorCode, rather than using extended errors. The fpFilterConfig() function is now deprecated.

  • In the Filter C API, KeyView now defines a KVFilterSession type which can be used instead of a void* for holding the session object provided by fpInit and passed to the other Filter API functions.

  • Error reporting has been simplified in the C API. In earlier versions of KeyView, some functions could return the error code KVERR_General. You could then call fpGetKvErrorCodeEx() to obtain an "extended" error code. In KeyView 23.2 the error codes have been unified such that all error codes are included in the KVErrorCodeenumeration. If a function returns an error code, there is no need to call a second function to obtain more information. This makes it easier to handle errors when an operation fails.

  • The fpGetXMPInfo* functions now return appropriate error codes on failure.

  • KeyView has been simplified so that it is much easier to map file formats to readers. File formats no longer have an associated "category". The KeyView configuration files such as formats.ini, formats_e.ini, and kvsdk.ini now identify file formats using the same file format numbers that are returned by format detection.

    For example, when KeyView detects an Adobe PDF file it returns format number 230. Imagine that you want to process PDF files with the reader pdf2sr.

    In previous versions of KeyView you had to find the associated format category (200) and use this to configure KeyView:

    200=pdf2

    In KeyView 23.2, this is no longer necessary and you instead use the same format number that is returned from format detection:

    230=pdf2
  • You can now remove certain third-party libraries that handle file formats you do not need to process. For a full list of these optional third-party components, and the changes you must make to exclude them, refer to the KeyView Filter SDK Programming Guide.

  • KeyView can now extract all platform-specific embedded files from PDF_Fmt documents.

  • KeyView can now filter WordPerfect Graphics (WordPerfect_Graphics_Fmt) files of adVECTORGRAPHIC class.

  • When you enable 'show hidden text', KeyView can now output author names for comments in Rich Text Format (MS_RTF_Fmt) documents.

  • When you enable 'show hidden text', KeyView can now output the value of the href attribute in HTML (HTML_Fmt) files.

  • KeyView can now process certain OpenOffice Text files that it would previously reject, and it can process image alt text in OpenOffice Text as hidden text. KeyView can also now filter text from the master slide in OpenOffice Presentations as hidden text.

  • When getting mail metadata from EML subfiles, KeyView now reports the sent date as a date instead of a string.

  • When getting subfile information from EML subfiles, KeyView now converts the file time to UTC, rather than an unspecified time zone.

  • For Microsoft Visio 2013 (.VSDX) files, KeyView now reports solution properties in the metadata.

  • KeyView now supports TIFF (TIFF_Fmt) files that use WebP compression.

  • Handling of Arabic diacritics (tashkil) has been significantly improved when using the pdfsr reader to process PDFs.

  • Text ordering has been improved when using the pdfsr reader to process PDFs.

  • In the C++ API, Session::subfiles now throws a keyview::password_protected_error if the container is protected and the session has not been configured with the correct password.

  • For the .NET API, the FilterTextDotNet sample program now includes a C# Project file FilterTestDotNet.csproj to make it easier to use.

  • In the C++ API, you can now enable tab delimiters for tables by using the tab_delimited and output_table_delimiters functions in the Configuration class.

  • The FreeType third-party library has been upgraded to version 2.12.1.

  • The ODA third-party library has been upgraded to version 2023.12.

  • The zlib third-party library has been upgraded to version 1.2.13.

  • The libxml2 third-party library has been upgraded to version 2.10.3.

  • The expat third-party library has been upgraded to version 2.5.0.

  • The ICU third-party library has been upgraded to version 72.1.

  • The openssl third-party library has been upgraded to version 3.0.8.

  • The libde265 third-party library has been upgraded to version 1.0.11.

  • The XMP-Toolkit third-party library has been upgraded to version 2022.06.

  • The wavpack third-party library has been upgraded to version 5.6.0.

  • The sqlite third-party library has been upgraded to version 3.41.0.

  • The third-party libical library was upgraded to version 3.0.16.

  • The third-party Apache Arrow library was upgraded to version 11.0.0. This change includes upgrades to the following dependencies: 

    • boost was upgraded to version 1.75.0

    • brotli was upgraded to version1.0.9

    • jemalloc was upgraded to version 5.3.0

    • re2 was upgraded to version 2022-06-01

    • thrift was upgraded to version 0.16.0

    • utf8proc was upgraded to version 2.7.0

    • zStandard was upgraded to version 1.5.2

Resolved Issues

  • (Security update) The third-party libtiff library has been upgraded to version 4.5.0 to resolve known vulnerabilities, including CVE-2022-2056, CVE-2022-2057, CVE-2022-2058, CVE-2022-3452, CVE-2022-3570, CVE-2022-3597, CVE-2022-3598, CVE-2022-3599, CVE-2022-3626, and CVE-2022-3627.
  • (Security update) The third-party protobuf library has been upgraded to version 3.21.12 to resolve known vulnerabilities, including CVE-2022-1941.

  • (Security update) The libjpeg third-party library has been upgraded to version 9e to resolve potential vulnerabilities.
  • (Security update) The libwebp third-party library has been upgraded to version 1.3.0.

  • When running out-of-process in stream mode, KeyView used a lot of pipe operations when detecting the format of some files, which could negatively impact the performance of the system.
  • KeyView could pause for up to a minute while trying to shut down an out-of-process process.
  • For password protected OpenOffice files (ODS, ODT and ODP), fpOpenFile did not return KVERR_PasswordProtected. Continuing with extraction could then result in invalid extracted files.

  • When attempting to get summary info from some password protected Office formats (DOCX, PPTX, XLSX, ODS, ODT and ODP), KeyView could return KVERR_General instead of KVERR_PasswordProtected.

  • KeyView could truncate long sections of text in PDF_Fmt documents.

  • KeyView did not retrieve the Image Width, Image Height and Bits Per Pixel in summary information from Tagged Image File Format (TIFF) TIFF_Fmt files.

  • For some Microsft Excel (XLSX) files with a lot of cells using Rich Data Types, KeyView output the names of those types incorrectly, using a number instead of a type name.

  • KeyView could skip some user defined properties in summary information for some OLE-based files like MS_Project_2007_Fmt.

  • KeyView did not perform OCR on animated PNG (APNG_Fmt) images.

  • KeyView could fail to extract some images from Rich Text Format (MS_RTF_Fmt) documents.

  • KeyView did not extract all the images from some Rich Text Format (MS_RTF_Fmt) documents.

  • The extraction API fpGetSubFileInfo function did not correctly report the sizes of subfiles when they were larger than 2GB.

  • Some PDF files took longer to process in version 12.13.0 of the SDK than in version 12.12.0.

  • Heic and Heif format documents could not be processed on macOS.

  • When Extract Images was enabled, filtering certain Word documents could cause KeyView to exit unexpectedly (in-process), or return an error (out-of-process).

  • The fpGetMainFileInfo function did not respect the source code detection option when KeyView was running out-of-process.

  • KeyView could return an error (out-of-process), or exit unexpectedly (in-process) when processing some Microsoft Visio (.vsd) files.

  • KeyView missed text from some Microsoft Visio (.vsd) files.

  • Some base-64 encoded attachments to ICS files were extracted incorrectly.

  • KeyView could fail to extract bzip2 files if Unexpected Zip Detection was enabled.

  • KeyView could report duplicate metadata from Tagged Image File Format (TIFF) files with multiple pages.

  • KeyView could output incorrect metadata names for some PDF files.

  • KeyView could omit metadata entries for some PDF files.

  • When using the pdfsr reader to process PDFs that contained right-to-left (RTL) text, some text at the top of the file was not included in the output.

  • KeyView could process some CSV files incorrectly, meaning fields were output in the wrong columns.

Notes

KeyView 23.2 is a new major version of IDOL, released in the second quarter of 2023. It is the first new major version since KeyView 12.0 was released in June 2018. KeyView 23.2 includes some changes that require you to update your license and application code. For more information about how to upgrade, see the KeyView upgrade guide.

Deprecated Features

The following features are deprecated and might be removed in a future release.

Category Deprecated Feature Deprecated Since
C API In KVFilterInitOptions, the KVF_OOPLOGON and KVF_OOPLOGOFF flags have been deprecated. Use the fpSetConfig() function to turn the out-of-process error log on or off. 24.1.0
C++ API

The Configuration::out_of_process_log function has been deprecated. Use Configuration:oop_error_log instead.

24.1.0
C++ API

The following functions have been deprecated:

  • Session::detect
  • Session::filter
  • Session::get_metadata
  • Session::get_restrictions
  • Session::metadata_map
  • Session::subfiles

KeyView 23.3 introduces a Document class to the C++ API. OpenText recommends that you create a Document object for each document and use the methods on that object instead. For example, call Document::info rather than Session::detect.

23.3.0
C++ Metadata API In KeyView 23.3, the Metadatum class was renamed MetadataElement, and some of its methods were renamed. The old names are deprecated. OpenText recommends using the new class and method names. 23.3.0
C API

The following functions have been deprecated. OpenText recommends that you create a KVDocument to represent each document, by calling fpOpenDocumentFromFile() or fpOpenDocumentFromStream(). You can then use the new Filter API functions that accept a KVDocument.

  • fpCanFilterFile()
  • fpCanFilterStream()
  • fpCloseStream()
  • fpFileToInputStreamCreate()
  • fpFileToInputStreamFree()
  • fpFilterFile()
  • fpFilterStream()
  • fpGetDocInfoFile()
  • fpGetDocInfoStream()
  • fpGetRestrictionsFile()
  • fpGetRestrictionsStream()
  • fpOpenStream()
  • fpOpenStreamEx2()
23.2.0
C API

As part of the improvements to simplify error handling, the following functions have been deprecated:

  • fpFilterConfig(). OpenText recommends that you use the function fpSetConfig() instead. This sets the same configuration options, and returns a KVErrorCode rather than a Boolean value.
  • fpGetKvErrorCodeEx(). You only need to call this if you use the deprecated functions fpGetDocInfoFile, fpGetDocInfoStream, and fpFilterConfig which return FALSE to indicate an error, rather than returning an error code.
23.2.0
C API

The following functions have been deprecated. OpenText recommends that you access metadata through the new metadata API, by calling fpGetMetadataList().

  • fpFreeOLESummaryInfo()
  • fpFreeXmpInfo()
  • fpGetOLESummaryInfo()
  • fpGetOLESummaryInfoFile()
  • fpGetXmpInfo()
  • fpGetXmpInfoFile()
23.2.0
C++ API

The following have been deprecated:

  • The Session::get_summary_information function.
  • The Subfile::mail_metadata function.
  • The SummaryInfoItem class.
  • The SummaryInfoVisitorBase class.
  • The SummaryInfoType enumeration.

OpenText recommends that you access metadata using the new metadata API.

23.2.0