Field Standardization
Common metadata fields such as "Title", "Author", and "Subject" exist in many different file formats, but can be stored in different ways. For instance, one raster image format may store the image width as a key-value pair with key Width
. Another format may store the image width in bytes 16-19 of the file. You might want to access the width of all raster images in the same way, regardless of the source format. Field standardization provides standard keys for some common metadata so that you can use the same code to handle many different file formats.
Many metadata field values are scalar values, and can be understood without reference to units – for example, page count. However, some values can be expressed in multiple units, and different file formats might store the same type of information in different units – for example, image width could be stored in pixels, twips, or another graphic unit. When a field is standardized, KeyView converts its value to the documented units, so all standardized fields that have the same key have a value that uses the same units.
The exception to this is string values - KeyView outputs a string as it appears in the document. For example, depending on how the author field is stored in the document, it might be output as “John Smith”, “Smith, John”, “J. Smith”, or another form.
Duplicate Fields
The same logical piece of metadata can appear multiple times in the metadata that KeyView outputs for a document:
-
The same information might occur multiple times in the source document. For example, when a raster image stores its width in both EXIF and XMP metadata, KeyView can output an element for each.
-
KeyView can output both a standardized element, and a non-standard element with its original key and value. The KeyView API provides a way to distinguish between standard and non-standard elements, and provides a way to identify non-standard elements that have a standardized alternative.
-
KeyView can output multiple non-standard elements for the same piece of metadata, with different keys or value types, to maintain backwards compatibility. Elements that are present to maintain backwards compatibility are identified as "superseded".
The KeyView API therefore provides a way to access common fields, and a way to process native fields that were not standardized, without having to process the same logical piece of metadata more than once.
Absence of Fields
The absence of a field in KeyView's metadata output does not imply a value for that field. For instance, if the standardized field KVMetadataKey_CharacterCount
is not present, that does not imply that the document contains zero characters: instead, it indicates either that the document does not contain a "character count" metadata field, or that KeyView does not support that field for that format.
If a field is not present in the document, KeyView does not attempt to construct that field, even where the value could be calculated from other information. For example, if the word count is missing from a word processing document, KeyView does not attempt to calculate it by counting the number of words in the file.
Similar Standardized Fields
Some standardized fields have similar meanings. For example, KVMetadataKey_Author
and KVMetadataKey_Artist
both represent people who in some way created the content of the document. KeyView does not attempt to interpret or define the meaning of these fields. KeyView standardizes an author
field as KVMetadataKey_Author
, and standardizes an artist
field as KVMetadataKey_Artist
. KeyView does not attempt to determine if an author is an artist or vice versa.
Metadata is not Validated
KeyView outputs metadata fields based on metadata stored within the document, but does not attempt to validate the information. For example, KeyView does not:
- check the word count stored in the metadata matches the actual number of words present in the document.
- validate that the MIP Label stored in a document can be used to decrypt the document.
- check that the signature in a signed executable is authentic.
- check that a PDF marked as being PDF/A conformant actually conforms.