Obtain Format Information
The file format detection module (kwad
) detects a file's format, and reports the information to your application.
When detecting the file format, File Content Extraction uses the content of the file rather than the file extension. In some cases, the file extension can be an unreliable marker because it might refer to many different versions of an application, or files from different pieces of software. In other cases, a file might be incorrectly labeled by accident or by a malicious actor.
File Content Extraction ignores the file extension and examines the content of a file to identify it correctly. Many formats use a ‘magic number’ at the start, which are useful for identification. However, magic numbers can be ambiguous and are sometimes insufficient, so File Content Extraction examines the file more deeply to ensure the basic validity of a file before it determines the format and increase the confidence in the result.
In all cases, File Content Extraction does the minimum amount of work required to be confident of the file format, so it can detect formats as quickly as possible.
You can obtain format information through the info
attribute of a document
object.
import keyview.filter as kv
import argparse
# Add your KeyView license here
KEYVIEW_LICENSE = "..."
parser = argparse.ArgumentParser(description = "IDOL KeyView Python API - file format detection")
parser.add_argument("file_path", help="The path of the input file")
parser.add_argument("--bin-path", help="The path to the KeyView bin directory", default='.')
args = parser.parse_args()
try:
# Create a new KeyView session
with kv.FilterSession(args.bin_path, KEYVIEW_LICENSE) as session:
# Open a document
with session.open(args.file_path) as doc:
# Print information about the document
print(f"Format Name: \t {doc.info.doc_format.name}")
print(f"Version: \t {doc.info.version}")
print(f"Category Name:\t {doc.info.doc_class.name}")
print(f"Encrypted: \t {doc.info.encrypted}")
except kv.KeyViewError as e:
print(e)
For information about mapping detected formats to document readers, see File Formats and Document Readers.