This section describes the format of the results produced by an OCR analysis task.
The following XML shows records from the Result
track of an OCR task. The analysis engine produces one record for each line of text in the analyzed image or video frame.
If you are processing a document, then unless you have set ProcessTextElements=FALSE
, some of the records in the Result
track could represent text that has been extracted from text elements that were present in the document.
<record> ... <trackname>OCR.Result</trackname> <OCRResult> <id>c0cf6d75-ad43-4fce-8589-e2a297923996</id> <text>New rover discovers life on Mars</text> <region> <left>35</left> <top>21</top> <width>290</width> <height>15</height> </region> <confidence>99</confidence> <angle>0</angle> <source>image</source> </OCRResult> </record> <record> ... <trackname>OCR.Result</trackname> <OCRResult> <id>e17ee583-e980-4d07-92c1-579657f46c3e</id> <text>Some more text</text> <region> <left>89</left> <top>66</top> <width>140</width> <height>15</height> </region> <confidence>99</confidence> <angle>0</angle> <source>image</source> </OCRResult> </record>
Each record contains the following information:
The id
element provides a unique identifier for the line of text. The OCR analysis engine issues an ID for each detected appearance of a line of text. If you are running OCR on video and consecutive frames show the same text, all records related to that appearance will have the same ID.
For example, if text appears in the same location for fifty consecutive video frames, the engine uses the same ID for each record in the data track and produces a single record in the result track. The record in the result track will have a timestamp that covers all fifty frames.
If the text moves to a different location on the screen, or disappears and then reappears, the engine considers this as a new detection and produces a new ID and a new record in the result track.
text
element contains the text recognized by OCR.region
element describes the position of the text in the ingested media. If the record represents a text element that has been extracted from a document, the region is accurate only if the source media was a PDF file. Position information is not extracted from other document formats.confidence
element provides the confidence score for the OCR process (from 0 to 100). For text that was extracted from a text element in a document, the confidence score is always 100
.angle
element gives the orientation of the text (rotated clockwise in degrees from upright).The source
element specifies the origin of the text. The possible values are:
image
- static text from an image or video.scroller, left
- text from video of a news ticker, with text scrolling from right to left.text
- the text originated from a text element in a document. When you analyze an image or document (but not video), OCR produces a WordResult
output track. This track contains a record for each recognized word. The following XML shows an example record.
NOTE: Text that is extracted from a text element in a document is not output to the WordResult
track.
<record> ... <trackname>OCR.WordResult</trackname> <OCRResult> <id>c0cf6d75-ad43-4fce-8589-e2a297923996</id> <text>New</text> <region> <left>35</left> <top>21</top> <width>39</width> <height>15</height> </region> <confidence>99</confidence> <angle>0</angle> <source>image</source> </OCRResult> </record>
Each record contains the following information:
id
element provides a unique identifier for the line of text. Records in the WordResult
track have the same ID if they refer to the same line of text. The ID for a word matches the ID for the corresponding line of text in the Result
track.text
element contains the recognized word.region
element describes the position of the word.confidence
, angle
, and source
elements provide the same information as described for the result track. When you analyze an image or document (but not video), OCR produces a CharResult
output track. This track contains a record for each line of text. However, the records in this track provide detail about individual characters rather than the whole line. The following XML shows an example record.
NOTE: Text that is extracted from a text element in a document is not output to the CharResult
track.
<record> ... <trackname>OCR.CharResult</trackname> <OCRDetail> <id>c0cf6d75-ad43-4fce-8589-e2a297923996</id> <character> <text>N</text> <region> <left>35</left> <top>21</top> <width>12</width> <height>15</height> </region> </character> <character> <text>e</text> <region> <left>49</left> <top>25</top> <width>10</width> <height>11</height> </region> </character> ... </OCRDetail> </record>
Each record includes the following information:
id
element provides a unique identifier for the line of text. Every record in the CharResult
track has a different id
.There is a character
element for each character on the line, including spaces. This element includes the following information:
text
- the character that was recognized. This element is empty if the character is a space.region
- the location of the character in the source media.OCR can identify tables that occur in images. When processing a table the record IDs in the Result
, WordResult
, and CharResult
tracks represent table cells rather than lines of text.
When OCR recognizes text that appears to be arranged in a table, it also produces a TableResult
track. This track contains a record for each table that was identified. Each record includes enough structure information to reconstruct the table. The records in the TableResult
track do not include the recognized text, instead they include record IDs that match the records in the Result
, WordResult
, and CharResult
tracks. For example:
<record> ... <trackname>OCR.TableResult</trackname> <OCRTableResult> <id>6596a664-b69a-4a33-b9fc-8adb2be6c37f</id> <region> <left>256</left> <top>166</top> <width>1213</width> <height>362</height> </region> <columnCount>9</columnCount> <rowCount>10</rowCount> <row> <cell> <columnSpan>1</columnSpan> </cell> <cell> <columnSpan>2</columnSpan> <OCRResultID>2240914c-440c-40cc-9254-c3c59727953e</OCRResultID> </cell> <cell> <columnSpan>3</columnSpan> </cell> <cell> <columnSpan>3</columnSpan> <OCRResultID>9ea804cc-a1a2-4d31-99f8-d1d96a3a1c9e</OCRResultID> </cell> </row> ... </OCRTableResult> </record>
Each record contains the following elements:
id
- a unique identifier for the table.region
- the position and size of the table in the media source.columnCount
- the total number of columns.rowCount
- the total number of rows.row
- contains the information for a single row. Each row
element contains cell
elements. Usually the number of cells in a row matches the value of columnCount
, but there can be fewer when cells span multiple columns. The number of columns spanned by a cell is given by the columnSpan
element. The OCRResultID
element provides the ID of an OCR result. This ID matches the ID of relevant records in the Result
, WordResult
, and CharResult
tracks, so that you can obtain the recognized text. If the cell is empty, the OCRResultID
element is omitted.NOTE: Tables contained in a text element in a document are not output to the TableResult
track. OCR only detects tables that are included as images.
Media Server includes an example session configuration and XSL transform, named Table.cfg
and toHTMLTable.xsl
, that use the information in the TableResult
track to output HTML tables.
|