Determine Format Support
After the file format is extracted, the detection module uses the formats_e.ini
file to determine whether the format is supported by KeyView, and the appropriate structured access layer and reader to load.
The formats_e.ini
file is in the directory install\OS\bin
, where install
is the path name of the Export installation directory and OS
is the name of the operating system. It contains the following information:
-
Coded format information. To translate this information, see Translate Format Information.
-
The reader associated with each format. See Determine a Document Reader.
-
Configuration parameters for out-of-process conversions.
-
Locale settings for internal use.
Below are some entries from the formats_e.ini
file:
123=mw 152=xyw 178=wp6 189=mw6 2=af 200=pdf 205=mb 210=htm 251=htm
NOTE: The formats_e.ini
file applies to all formats except graphics. Detection of graphics formats is handled by an internal module named KeyView Picture Interchange Format (KPIF).
Refine Detection of Text Files
During text detection, KeyView analyzes the first 1 kB and last 1 kB of data in a document; if less than 10% of that data consists of non-ASCII characters, KeyView detects the document as a text file.
However, depending on the type of documents you are working with, the default settings might not provide the desired level of accuracy. Configuration flags allow you to change the amount of data to read at the end of a file, the percentage of non-ASCII characters permitted in a text file, and whether to use or ignore the file extension to determine the document format.
Change the Amount of File Data to Read
During file detection, KeyView reads characters from the beginning and end of a file—by default, it reads the first and last 1,024 bytes of data. Large text files might contain many irrelevant characters at the end of a file, so KeyView might not accurately detect the file format. You can set a configuration flag to increase the amount of data to read from the end of a file during detection.
To change the amount of data to read during detection
-
In the
formats_e.ini
file, set the following flag in thedetection_flags
section:[detection_flags] non_ascii_chars_end_block_size=kB
where
kB
is the number of kilobytes to read from the end of the file, from0
to10
. The default value is1
.NOTE: The file size must be greater than the value specified in the flag. If the flag value is greater than the file size, KeyView does not use the flag.
Change the Percentage of Allowed Non-ASCII Characters
By default, if less than 10% of the analyzed data in a document consists of non-ASCII characters, it is detected as a text file. Depending on the type of files you are working with, changing the default percentage might increase detection accuracy.
To change the percentage of non-ASCII characters allowed in text files
-
In the
formats_e.ini
file, set the following flag in thedetection_flags
section:[detection_flags] non_ascii_chars_in_text=N
where
N
is the percentage of non-ASCII characters to allow in text files. Files that contain a lower percentage of non-ASCII characters thanN
are detected as text files. The default value is10
.
Use the File Extension for Detection
Sometimes KeyView detects certain file formats (such as CSV) as ASCII because of the content of the documents. In such cases, you can configure KeyView to use the file extension to determine the document format. Using the file extension can improve detection of formats such as CSV, but might not detect text files successfully if they have incorrect file extensions.
To use the file extension for ASCII files during detection
-
In the
formats.ini
file, set the following flag in thedetection_flags
section:[detection_flags] use_extension_for_ascii=1
The default is
0
(do not use the file extension).
Allow Consecutive NULL Bytes in a Text File
By default, if a document contains consecutive NULL
bytes, it is not detected as text. Depending on the type of files you are working with, changing the default might increase detection accuracy.
To allow consecutive NULL bytes of ASCII characters in text files
In the formats.ini
file, set the following flag in the detection_flags
section:
[detection_flags] ascii_allow_null_bytes=1
The default value is 0
(do not allow consecutive NULL
bytes).