tstxtract

The tstxtract sample program demonstrates the File Extraction API. It opens a file, extracts subfiles from the file, and repeats the extraction process until all subfiles are extracted. It also demonstrates how to extract the default set of metadata and pass integer or string names to extract specific metadata. After the files are extracted, you can convert the files by using one of the conversion sample programs.

The source code for the tstxtract sample program is the same for the Filter and Export SDKs. A flag in the makefile specifies whether the program is compiled for Filter, HTML Export, or XML Export.

To run tstxtract, type the following at the command line:

tstxtract [options] input_file output_directory bin_directory

where:

  • options is one or more of the following:
Option Description
-c charset Specify the target character set, for example KVCS_SJIS. See Coded Character Sets for a full list of supported character sets.
-cf keyfile1,
keyfile2,...
Specify one or more credential files (private keys) to use to decrypt encrypted .EML, .MBX, .PST, or .MSG files.
-l logfile Specify the path and file name of the log file in which metadata is written.
-lm Retrieve metadata and write the data to the log file.
-lms metaname1,
metaname2,...
Retrieve metadata with string metanames and write the data to the log file for .MSG, .EML, .MBX, and .NSF files.
-lmi metaint1,
metaint2,...
Retrieve metadata with integer (hexadecimal) metanames and write the data to the log file for .PST files.
-lma Retrieve all metadata from an .NSF file and write the data to the log file.
-r

Recursively extract second-level subfiles to the specified output directory. For example, if a .ZIP file contains a Microsoft Word file and the Word file contains an embedded Microsoft Excel file, set the -r option to extract both the Word and Excel files.

If this option is not set, only first-level subfiles are extracted. In this case, only the Word file would be extracted.

-msg Extract mail messages in a .PST file as an .MSG file, including all of its attachments. If this flag is not set, the mail message is extracted as text. This option sets the KVExtractionFlag_SaveAsMSG flag in the KVExtractSubFileArg structure.
-f Extract the formatted version of the message body (HTML or RTF) from mail files when possible. If neither an HTML nor RTF version of the message body exists in the mail file, it is extracted as plain text. If you do not set this flag, the message body is extracted as plain text when possible.
-t Preserve the timestamp of embedded files when possible.
-h Extract hidden text.
  • input_file is the full path and file name of the source document.
  • output_directory is the directory to which the files are extracted.
  • bin_directory is the path to the Export bin directory. This is required if you do not run the program from the KeyView bin directory.