DetectAlphabet

Sometimes, if you do not know the language of the input text in advance of processing, you might specify multiple Languages. OCR requires more processing time for each additional language, especially when the languages span multiple alphabets (Latin, Cyrillic, Chinese, Arabic, and so on).

This parameter specifies whether to detect the alphabet for each image (or page of a document), before running OCR. You can choose one of the following values:

  • Off. Media Server does not detect the alphabet. Use this option when you have specified a single language or multiple languages that use the same alphabet. OpenText also recommends this option when you expect a single image or document page to use multiple alphabets (for example, when there is English and Arabic text on the same page).
  • Listed. Media Server detects the alphabet, but only considers alphabets that are represented in the list of Languages. This option can reduce the time required to recognize characters, because languages that do not match the detected alphabet are ignored. For example, if you set Languages=en,ja,ko (English, Japanese, and Korean) and Media Server detects the Latin alphabet, OCR ignores the Japanese and Korean languages. OpenText recommends using this option when each source image (or each page of the source document) uses a single alphabet, and the list of possible Languages is known but spans multiple alphabets.
  • Any. Media Server detects the alphabet that is used, and considers all alphabets. This option can reduce the time required to recognize characters, because languages that do not match the detected alphabet are ignored. If none of the Languages match the detected alphabet, Media Server does not recognize characters and there is no output. OpenText recommends using this option instead of Listed when you want to reject images or pages that do not match any of the specified languages.

If your input contains Chinese, Japanese, or Korean text with some ASCII characters, you can safely set this parameter to any of the available options, because Media Server includes ASCII characters for those languages.

This parameter applies only when you ingest images or documents using the Image Ingest Engine, with FontType=auto (the default value).

Type: String
Default: Off
Required: No
Configuration Section: TaskName
Example: DetectAlphabet=Any
See Also:

Languages

OcrMode

FontType