DetectAlphabet
Sometimes, if you do not know the language of the input text in advance of processing, you might specify multiple Languages. OCR requires more processing time for each additional language, especially when the languages span multiple alphabets (Latin, Cyrillic, Chinese, Arabic, and so on).
This parameter specifies whether to detect the alphabet for each image (or page of a document), before running OCR. You can choose one of the following values:
Off
. Media Server does not detect the alphabet. Use this option when you have specified a single language or multiple languages that use the same alphabet. OpenText also recommends this option when you expect a single image or document page to use multiple alphabets (for example, when there is English and Arabic text on the same page).Listed
. Media Server detects the alphabet, but only considers alphabets that are represented in the list of Languages. This option can reduce the time required to recognize characters, because languages that do not match the detected alphabet are ignored. For example, if you setLanguages=en,ja,ko
(English, Japanese, and Korean) and Media Server detects the Latin alphabet, OCR ignores the Japanese and Korean languages. OpenText recommends using this option when each source image (or each page of the source document) uses a single alphabet, and the list of possible Languages is known but spans multiple alphabets.Any
. Media Server detects the alphabet that is used, and considers all alphabets. This option can reduce the time required to recognize characters, because languages that do not match the detected alphabet are ignored. If none of the Languages match the detected alphabet, Media Server does not recognize characters and there is no output. OpenText recommends using this option instead ofListed
when you want to reject images or pages that do not match any of the specified languages.
If your input contains Chinese, Japanese, or Korean text with some ASCII characters, you can safely set this parameter to any of the available options, because Media Server includes ASCII characters for those languages.
This parameter applies only when you ingest images or documents using the Image Ingest Engine, with FontType=auto
(the default value).
Type: | String |
Default: | Off |
Required: | No |
Configuration Section: | TaskName |
Example: | DetectAlphabet=Any
|
See Also: |