Extract Mail Metadata
You can extract metadata, such as subject, sender, and recipient, from subfiles of mail formats, by calling the fpGetSubFileMetaData() function. You can extract a predefined set of common metadata fields, a list of metadata fields by their names or MAPI properties, or, for some subfile types, all the metadata in the file.
Default Metadata Set
KeyView internally defines a set of common mail metadata fields that you can extract as a group from mail formats. This default metadata set is listed in the following table.
Because mail formats use different terms for the same fields, the format’s reader maps the default field name to the appropriate format-specific name. For example, when retrieving the default metadata set, the NSF field Importance is mapped to the name Priority and is returned.
You can also extract the default field names individually by passing the field name (such as From, To, and Subject); however, in this case, the string is not mapped to the format-specific name. For example, if you pass Priority in the call, you retrieve the contents of the Priority field from an MBX file, but do not retrieve the contents of the Importance field from an NSF file.
NOTE: You cannot pass the field names listed in the table individually for PST files. However, you can pass either the MAPI tag number or the MAPI tag name as integers. See Microsoft Personal Folders File (PST) Metadata.
Extract the Default Metadata Set
To extract the default metadata set, call the fpGetSubFileMetaData() function, and pass in 0
for metaArg->metaNameCount
, and NULL
for metaArg->metaNameArray
.
KVGetSubFileMetaArgRec metaArg; KVSubFileMetaData pMetaData = NULL; KVStructInit(&metaArg); metaArg.index = subFileIndex; metaArg.metaNameCount = 0; metaArg.metaNameArray = NULL; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg, &pMetaData); ... extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;
Extract All Metadata
KeyView can extract all metadata from MSG, EML, MBX, MIME, NSF, ICS, and DXL subfiles. You can extract all metadata in a similar way to extracting the default metadata set, but when you call the fpGetSubFileMetaData() function, pass in -1
for metaArg->metaNameCount
and NULL
for metaArg->metaNameArray
.
Microsoft Outlook (MSG) Metadata
In addition to the default metadata set, you can extract the metadata fields listed in the following table for MSG files. You must pass the field name to metaNameArray
in the call to the fpGetSubFileMetadata()
function.
Field Name (string to specify) | Description |
---|---|
AttachFileName
|
An attachment's long file name and extension, excluding the path. |
ConversationTopic
|
The topic of the first message in a conversation thread. A conversation thread is a series of messages and replies. This is the first message’s subject with any prefix removed. |
CreationTime
|
The time that the message or attachment was created. This value is displayed in the Sent field in the message’s Properties dialog in Outlook. |
InternetMessageID
|
The identifier for messages that come in over the Internet. This is the MAPI property PR_INTERNET_MESSAGE_ID . This property is not in the MAPI headers or MAPI documentation. |
LastModificationTime
|
The time that the message or attachment was last modified. This value is displayed in the Modified field in the message’s Properties dialog in Outlook. |
Location
|
The physical location of the event specified in the Outlook calendar entry. |
MessageID
|
The message transfer system (MTS) identifier for the message transfer agent (MTA). This value is displayed on the Message ID tab in the message’s Properties dialog in Outlook. |
Received
|
The date and time a message was delivered. This value is displayed in the Received field in the message’s Properties dialog in Outlook. |
Sender
|
The name and email address of the message sender. This value is a concatenation of two MAPI properties in the following format:
The |
Sensitivity
|
The value indicating the message sender's opinion of the sensitivity of a message. For example, Personal, Private, or Confidential. This value is displayed in the Sensitivity field in the message’s Properties dialog in Outlook. |
TransportMsgHeaders
|
Transport-specific message envelope information. This value corresponds to the MAPI property PR_TRANSPORT_MESSAGE_HEADERS . |
StartDate
|
An appointment start date. This value corresponds to the PR_START_DATE MAPI property. |
EndDate
|
An appointment end date. This value corresponds to the PR_END_DATE MAPI property. |
Extract MSG-Specific Metadata
To extract specific metadata fields from an MSG file, call the fpGetSubFileMetaData() function, and pass the field name defined in Default Metadata Set to metaNameArray
(the string is not case sensitive).
For example, the following code extracts the contents of the ConversationTopic
and MessageID
fields:
KVGetSubFileMetaArgRec metaArg; KVSubFileMetaData pMetaData = NULL; KVStructInit(&metaArg); KVMetaNameRec names[2]; KVMetaName pname[2]; names[0].type = KVMetaNameType_String; names[0].name.sname = "conversationtopic"; names[1].type = KVMetaNameType_String; names[1].name.sname = "MessageID"; pname[0] = &names[0]; pname[1] = &names[1]; metaArg.metaNameCount = 2; metaArg.metaNameArray = pname; metaArg.index = subFileIndex; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg, &pMetaData); ... extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;
Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata
In addition to the default metadata set, you can extract any metadata field that exists in the header of an EML or MBX file by passing the field’s name. If the name is a valid field in the file, the content of the field is returned. For example, to retrieve the name of the last mail server that received the message before it was delivered, you can pass the string "Received
".
Extract EML- or MBX-Specific Metadata
To extract specific metadata fields from an EML or MBX file, call the fpGetSubFileMetaData() function, and pass the metadata name to metaNameArray
(the string is not case sensitive).
For example, the following code extracts the contents of the Received
and Mime-version
fields:
KVGetSubFileMetaArgRec metaArg; KVSubFileMetaData pMetaData = NULL; KVStructInit(&metaArg); KVMetaNameRec names[2]; KVMetaName pname[2]; names[0].type = KVMetaNameType_String; names[0].name.sname = "Received"; names[1].type = KVMetaNameType_String; names[1].name.sname = "Mime-version"; pname[0] = &names[0]; pname[1] = &names[1]; metaArg.metaNameCount = 2; metaArg.metaNameArray = pname; metaArg.index = subFileIndex; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg, &pMetaData); ... extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;
Lotus Notes Database (NSF) Metadata
In addition to the default metadata set, you can extract any Lotus field name that exists in an NSF file by passing the field’s name. (You can extract fields from mail NSF files and non-mail NSF files.) If the name is a valid field in the file, the field is returned. For example, to retrieve the date when a document in an NSF file was last accessed, you would pass the string "$LastAccessedDB
".
NOTE: A complete list of NSF fields is provided in the Lotus Notes file stdnames.h
. This header file is available in the Lotus API Toolkit.
Extract NSF-Specific Metadata
To extract specific metadata fields from an NSF file , call the fpGetSubFileMetaData() function, and pass the metadata name to metaNameArray
(the string is not case sensitive).
For example, the following code extracts the contents of the Description
and Categories
fields:
KVGetSubFileMetaArgRec metaArg; KVSubFileMetaData pMetaData = NULL; KVStructInit(&metaArg); KVMetaNameRec names[2]; KVMetaName pname[2]; names[0].type = KVMetaNameType_String; names[0].name.sname = "description"; names[1].type = KVMetaNameType_String; names[1].name.sname = "Categories"; pname[0] = &names[0]; pname[1] = &names[1]; metaArg.metaNameCount = 2; metaArg.metaNameArray = pname; metaArg.index = subFileIndex; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg, &pMetaData); ... extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;
Microsoft Personal Folders File (PST) Metadata
In addition to the default metadata set, you can extract Messaging Application Programming Interface (MAPI) properties from a PST file. These properties describe all elements of an Outlook item in a PST file (such as subject, sender, recipient, and message text). Because the properties are stored in the PST file itself, you can retrieve them before you extract the contents of the PST. This enables you to determine whether an Outlook item should be extracted based on its attributes. Some MAPI properties are also stored for Outlook attachments that are not mail messages (such as an attached Microsoft Word document or Lotus 1-2-3 file).
NOTE: Because all elements of a message (except non-mail attachments) are represented by MAPI properties, you can extract all components of a subfile, including the header and message text, by calling the fpGetSubFileMetadata()
function.
MAPI Properties
Each MAPI property is identified by a property tag, which is a constant that contains the property type and a unique identifier. For example, the property that indicates whether a message has attachments has the following components:
Property | PR_HASATTACH
|
Identifier | 0x0E1B
|
Property type | PT_BOOLEAN (000B ) |
Property tag | 0x0E1B000B
|
The Microsoft MAPI documentation on the Microsoft Developer Network website lists all available MAPI properties, their tags, and types.
You can retrieve any MAPI property that is of one of the MAPI property types listed below:
PT_I2
|
PT_DOUBLE
|
PT_STRING8
|
PT_I4
|
PT_FLOAT
|
PT_TSTRING
|
PT_BINARY
|
PT_LONG
|
PT_SYSTIME
|
PT_BOOLEAN
|
PT_SHORT
|
PT_UNICODE
|
NOTE: Properties with a PT_TSTRING
type have the property type recompiled to either a Unicode string (PT_UNICODE
) or to an ANSI string (PT_STRING8
) depending on the operating system’s character set. To retrieve the Unicode property, pass in the Unicode version of the tag. For example, the property tag for PR_SUBJECT
is either 0x0037001E
for an ANSI string, or 0x0037001F
for a Unicode string.
Extract PST-Specific Metadata
In the call to extract subfile metadata, you can pass either the MAPI tag number (such as 0x0070001e
) or the MAPI tag name (such as PR_CONVERSATION_TOPIC
). If you specify the MAPI tag name, you must include the mapitags.h
and mapidefs.h
Windows header files, in which the MAPI tag name is defined as a tag number.
To extract specific MAPI properties from a PST file, call the fpGetSubFileMetaData() function, and pass the property tag to metaNameArray
. The tag is passed as an integer.
For example, the following code extracts the MAPI properties PR_SUBJECT
and PR_ALTERNATE_RECIPIENT
:
KVGetSubFileMetaArgRec metaArg; KVSubFileMetaData pMetaData = NULL; KVMetaNameRec names[2]; KVMetaName pName[2]; names[0].type = KVMetaNameType_Integer; names[0].name.iname = PR_SUBJECT; names[1].type = KVMetaNameType_Integer; names[1].name.iname = 0x3A010102; pName[0] = &names[0]; pName[1] = &names[1]; KVStructInit(&metaArg); metaArg.metaNameCount = 2; metaArg.metaNameArray = pName; metaArg.index = SubFileIndex; error = extractInterface->fpGetSubFileMetaData (pFile,&metaArg,&pMetaData); ... extractInterface->fpFreeStruct(pFile,pMetaData);
pMetaData = NULL;
NOTE: You must include the mapitags.h
and mapidefs.h
Windows header files, in which PR_SUBJECT
is defined as 0x0037001E
.
Exclude Metadata from the Extracted Text File
When you extract a mail message, the message text and header information (To
, From
, Sent
, and so on) is also extracted. You can prevent the header information from appearing in the text file.
To exclude the header information, set extractFlag
to KVExtractionFlag_ExcludeMailHeader
in the call to fpExtractSubFile().