This section describes the format of the results produced by an OCR analysis task.
The following XML shows records from the Result
track of an OCR task. The analysis engine produces one record for each line of text in the analyzed image or video frame.
If you are processing a document, then unless you have set ProcessTextElements=FALSE
, some of the records in the Result
track could represent text that has been extracted from text elements that were present in the document.
<record> ... <trackname>ocr.Result</trackname> <OCRResult> <id>14565401-b521-4135-94c8-b30f02264f38</id> <text>rover discovers life on Mars</text> <region> <left>240</left> <top>31</top> <width>194</width> <height>12</height> </region> <confidence>89</confidence> <angle>0</angle> <source>image</source> </OCRResult> </record> <record> ... <trackname>ocr.Result</trackname> <OCRResult> <id>59dad245-c268-4506-ac42-5752dd123576</id> <text>discovery confirmed yesterday and announced to world press</text> <region> <left>120</left> <top>62</top> <width>434</width> <height>15</height> </region> <confidence>88</confidence> <angle>0</angle> <source>image</source> </OCRResult> </record>
Each record contains the following information:
The id
element provides a unique identifier for the line of text. The OCR analysis engine issues an ID for each detected appearance of a line of text. If you are running OCR on video and consecutive frames show the same text, all records related to that appearance will have the same ID.
For example, if text appears in the same location for fifty consecutive video frames, the engine uses the same ID for each record in the data track and produces a single record in the result track. The record in the result track will have a timestamp that covers all fifty frames.
If the text moves to a different location on the screen, or disappears and then reappears, the engine considers this as a new detection and produces a new ID and a new record in the result track.
text
element contains the text recognized by OCR.region
element describes the position of the text in the ingested media. If the record represents a text element that has been extracted from a document, the region is accurate only if the source media was a PDF file. Position information is not extracted from other document formats.confidence
element provides the confidence score for the OCR process (from 0 to 100). For text that was extracted from a text element in a document, the confidence score is always 100
.angle
element gives the orientation of the text (rotated clockwise in degrees from upright).The source
element specifies the origin of the text. The possible values are:
image
- static text from an image or video.scroller, left
- text from video of a news ticker, with text scrolling from right to left.text
- the text originated from a text element in a document. An OCR analysis task that analyzes an image or document (but not video) also produces a WordResult
output track. To this track the OCR analysis engine adds a record for each word. The following XML shows an example record.
Text that is extracted from a text element in a document is not output to the WordResult
track.
<record> ... <trackname>ocr.WordResult</trackname> <OCRResult> <id>cdbca09b-c289-40af-b6e6-02427fafad91</id> <text>rover</text> <region> <left>240</left> <top>31</top> <width>194</width> <height>12</height> </region> <confidence>89</confidence> <angle>0</angle> <source>image</source> </OCRResult> </record>
|