Obtain Format Information
The file format detection module (kwad
) detects a file's format, and reports the information to your application.
When detecting the file format, File Content Extraction uses the content of the file rather than the file extension. In some cases, the file extension can be an unreliable marker because it might refer to many different versions of an application, or files from different pieces of software. In other cases, a file might be incorrectly labeled by accident or by a malicious actor.
File Content Extraction ignores the file extension and examines the content of a file to identify it correctly. Many formats use a ‘magic number’ at the start, which are useful for identification. However, magic numbers can be ambiguous and are sometimes insufficient, so File Content Extraction examines the file more deeply to ensure the basic validity of a file before it determines the format and increase the confidence in the result.
In all cases, File Content Extraction does the minimum amount of work required to be confident of the file format, so it can detect formats as quickly as possible.
You can obtain format information from a document by using the fpGetDocInfo() function. This extracts the file format, file class, version, and document attributes, and populates an ADDOCINFO structure. This structure and values are defined in the header file adinfo.h
.
For information about mapping detected formats to document readers, see File Formats and Document Readers.