ProcessTextElements
Documents such as PDF files can contain text in the form of an image and in the form of embedded text (text elements). When you ingest a PDF file, Media Server uses KeyView to generate a raster image of each page. Any visible text elements are rendered as part of the image. This parameter specifies how to use the text elements.
- By default, OCR ignores any part of an image that is covered by a text element, and returns the text contained in the text element. This should result in perfect accuracy and require almost no processing time. The remaining parts of the image are then processed by running OCR.
- If you set
ProcessTextElements=FALSE
, Media Server uses OCR to process the whole image and does not use the text elements that are embedded in the document.
Sometimes, text elements are added to scanned documents so that users can search the PDF, or highlight and copy text from the document. This can be done by adding invisible text elements over the image of the text. If you ingest one of these documents then by default Media Server will use the text elements rather than running OCR. If the text elements are not accurate then you might prefer to set ProcessTextElements=FALSE
so that Media Server runs OCR on the original image.
This parameter has no effect when you process image files and video.
Type: | Boolean |
Default: | True |
Required: | No |
Configuration Section: | TaskName |
Example: | ProcessTextElements=False
|
See Also: | Region |