Skip Embedded Fonts

Text in PDF files sometimes contain embedded fonts. If you experience difficulties filtering embedded fonts, there are options in the API, the formats.ini file, and the FilterTest sample program that you can set to skip this type of text.

NOTE: If you choose to skip embedded fonts, none of the content that contains embedded fonts is included in the output.

Use the formats.ini File

To skip embedded fonts using the formats.ini file

  • Set the following parameters:

    [pdf_flags]
    skipembeddedfont=TRUE
    embedded_font_threshold=threshold

    where threshold is a value between 0 and 100. A threshold of 100 skips all embedded font text; a threshold of 0 retains all embedded font text. Set skipembeddedfont to TRUE to enable the embedded_font_threshold parameter.

    The default value of embedded_font_threshold is 100. if you set skipembeddedfont to true and do not specify the embedded_font_threshold parameter, Filter skips all embedded text.

When you use formats.ini to skip embedded fonts, you can also specify an embedded font threshold, which is an arbitrary percentage probability that the glyph in the embedded text maps to a character value in the output character set (ASCII, UTF-8, and so on).

For example, if you specify a threshold of 75, embedded text glyphs that have a 75% or greater probability of correctly matching the character in the output character set are included in the output; glyphs that have a probability of less than 75% of matching the output character set are omitted from the output.

Use the Java API

To skip embedded fonts using the Java API, set the setSkipEmbeddedFont(boolean) method to true. For example:

objFilter.setSkipEmbeddedFont(true);

The FilterTest sample program demonstrates this method. See FilterTest.