Introduction

A file can contain other files, which we call subfiles. Examples of subfiles include e-mail attachments and embedded OLE objects. A file that contains subfiles is called a container file.

The following are examples of container files:

  • Archive files such as the ZIP, TAR, and RAR formats.
  • Mail messages such as Outlook (MSG) and Outlook Express (EML).
  • Mail stores such as Microsoft Outlook Personal Folders (PST), Mailbox (MBX), and Lotus Notes database (NSF).
  • PDF files that contain file attachments.
  • Compound documents with embedded OLE objects such as a Microsoft Word document with an embedded Excel chart.

For information about the formats from which KeyView can extract subfiles, see the "Extract" column in the section Document Readers.

Using KeyView to filter or export a container file might result in little or no output. A container might not have any text content of its own. However, you might be able to filter or export a container's subfiles. Through the KeyView API you can see whether a file is a container, see how many subfiles it contains, and access or extract those subfiles for further processing.

To obtain all possible content from a file, you can filter it to obtain plain text, extract it to obtain subfiles, retrieve metadata the file stores about itself and metadata the file stores about its subfiles. You can then repeat this process for each subfile you’ve extracted.

Subfiles can be Containers

Subfiles might also be container files, creating a file hierarchy of multiple levels. For example, an MSG file might contain three attachments:

  • a Microsoft Word document that contains an embedded Microsoft Excel spreadsheet.
  • an AutoCAD drawing file (DWG).
  • an EML file with an attached Zip file, which in turn contains four archived files.

NOTE: The MSG file contains four first-level children. The body text of a mail message is considered as a subfile (see Extract Mail Files for more information).