Text Detection Results

The following XML shows a single record produced by text detection.

<record>
   ...
   <trackname>DetectText.Result</trackname>
   <TextDetectionResult>
      <id>aa056c5e-3cea-4cfa-a090-713384270424</id>
      <region>
         <left>264</left>
         <top>280</top>
         <width>106</width>
         <height>18</height>
      </region>
      <confidence>100</confidence>
      <parentID>4d69390f-a8c4-4c5d-a0b0-705a3f98aa9b</parentID>
   </TextDetectionResult>
</record>

The record contains the following information:

  • The id element provides a unique identifier for each example of detected text. The text detection engine does not track text across video frames, so if you process video and the same text appears in consecutive frames there will be records (with different identifiers) in the result track for each frame. There might be multiple records per frame if text is detected in more than one region.

  • The region element describes the position of the detected text in the image or video frame (as a rectangle). left provides the number of pixels between the left side of the image and the left side of the region. top provides the number of pixels between the top of the image and the top of the region. width and height provide the width and height of the region.
  • The confidence element describes the confidence score for the detection, from 0 to 100, where higher values represent greater confidence.
  • The parentID element is empty, unless you configure the analysis engine with Region=Input in which case it contains the UUID of the input record. This provides a way to link the result with other records (from another analysis task) that supplied the region to analyze. To generate a single record combining the information, you can use the Combine ESP engine and the example Lua script parentuuidMatch.lua.