Character Encoding
To ensure that all filtered text is output in the same character encoding, KeyView performs character encoding conversion. In most cases, if your license includes advanced character set detection, KeyView can detect the character encoding used in a source file, and automatically outputs filtered text in the encoding you choose. OpenText recommends that you specify your preferred target encoding. In the rare cases where KeyView cannot detect the character encoding used in a source file, you can also specify the source encoding.
Specify a Target Character Encoding
OpenText recommends that you specify a target character encoding when you initialize KeyView, and recommends using UTF-8 or UTF-16 because these are widely supported and can encode a diverse range of characters. To see which encodings you can use as the target encoding, see Coded Character Sets.
To specify a target character encoding
- In the C++ API, set the property target_encoding on the configuration object. Set the target encoding to one of the values in the enumerated list
Encoding
inKeyview_Enumerations.hpp
.
Performance Considerations
When a file format does not specify a character encoding, KeyView attempts to detect the encoding automatically. Some character encodings, including UTF-8 and UTF-16, can be detected by core KeyView functionality but others can be detected only if your license includes advanced character set detection. Advanced character set detection is enabled by default (if it is included in your license), but can increase the time required to filter some documents.
You can disable advanced character set detection on a file-by-file basis. Before doing this, be aware that KeyView cannot output filtered text in your chosen encoding unless it detects the encoding of the source file, or you specify the source encoding yourself though the API.
To disable advanced character set detection
- In the C++ API, set character_set_detection to
FALSE
.
Specify a Source Character Encoding
In most cases, KeyView can automatically detect the character encoding of an input file and specifying a source encoding is not necessary. You might need to specify the source character encoding if you have disabled advanced character set detection.
To specify the source character encoding
- In the C++ API, set source_encoding on a
Configuration
object, before creating a session, using any value in the enumerated listEncoding
inKeyview_Enumerations.hpp
. See The Configuration Class for more information.
Disable Character Encoding Conversion
You can completely disable character encoding conversion, and specify that KeyView retain the original character encoding of the document.
To disable character encoding conversion
- In the C++ API, set no_encoding_conversion to
TRUE
.