Output Records
This section describes the records that are produced when you ingest an image or document file.
Image Data
The image ingest engine writes image data to a track named TaskName.Image_1
. In your session configurations you can refer to this track using the alias Default_Image
.
The following sample XML shows a record produced when you ingest an image, multi-page image such as a TIFF file, or a presentation file (.PPT, .PPTX, .ODP).
<record> <pageNumber>1</pageNumber> <trackname>Source.Image_1</trackname> <Page> <image> <imagedata>...</imagedata> <width>222</width> <height>140</height> <pixelAspectRatio>1:1</pixelAspectRatio> <format>PNG</format> </image> <pagetext/> </Page> </record>
The record contains the following information:
- The
pagenumber
element describes the page that the record is associated with. Most image files have a single page but formats such as TIFF support multiple pages. -
The
image
element contains information about the image.- The
imagedata
element contains the image data, base-64 encoded. - The
width
andheight
elements provide the size of the image. - The
pixelAspectRatio
element describes the shape of the pixels that make up the image, for example 1:1 pixels are square. - The
format
element describes the format of the image data contained in theimagedata
element. For images in theDefault_Image
track, this value is alwaysPNG
.
- The
If you ingest a document such as a PDF file, the output might also include the text extracted from text elements:
<record> <pageNumber>1</pageNumber> <trackname>Source.Image_1</trackname> <Page> <image> <imagedata>...</imagedata> <width>892</width> <height>1260</height> <pixelAspectRatio>1:1</pixelAspectRatio> <format>PNG</format> </image> <pagetext> <element> <text>Some text</text> <region> <left>115</left> <top>503</top> <width>460</width> <height>41</height> </region> <angle>0</angle> </element> ... </pagetext> </Page> </record>
The pagetext
element contains information about associated text elements. If the ingested media was a PDF file, each record represents a page. If the ingested media was another type of document the record represents an embedded image and the text that follows it, up to the next embedded image.
Each element
element describes a text element and contains the following data:
- The
text
element contains the text from the text element. -
The
region
element provides the position of the text element on the page.NOTE: The region information is accurate only if the ingested document was an Adobe PDF file.
- The
angle
element provides the orientation of the text.
Information about text elements is used by the OCR analysis engine, which automatically combines the text elements with the text extracted from images, to produce a complete transcript of the text that appears on the page.
Source Information
The image ingest engine produces a proxy track, named taskName.proxy
, where taskName
is the name of your ingest task. The purpose of the proxy
track is to contain information about the ingested source. The engine produces one record in this track for each page in the ingested image or document.
The following XML shows a sample record:
<record> <pageNumber>1</pageNumber> <trackname>ImageIngestTask.Proxy</trackname> <proxyData name="file.jpg" path="file.jpg" url="file:///C:/MediaServer/file.jpg" mimeType="image/jpeg" pages="1"> <streams> <videoStream id="0" width="2592" height="1936" horizontalDPI="300" verticalDPI="300" displayWidth="2592" displayHeight="1936" sar="1:1" codec=""/> </streams> <metadata> <tag name="Author">A Name</tag> <tag name="Creation Date">2014-04-09T09:15:19Z</tag> <tag name="Flash">Flash did not fire</tag> <tag name="GPS Latitude">52° 13' 10.69"</tag> <tag name="GPS Longitude">0° 8' 49.23"</tag> ... </metadata> </proxy> </record>
The proxyData
element can include the following attributes:
name
- the name of the media source, equivalent topath
for files andurl
for streams. This element is always present, regardless of the source type.path
- the path of the media source.url
- the URL of the media source.mimeType
- describes the MIME type of the media source.pages
- the total number of pages in the media source.
The videoStream
element can include the following attributes:
width
andheight
- the width and height of the media source, in pixels.horizontalDPI
andverticalDPI
- the horizontal and vertical resolution of the media source, in dots per inch (DPI). These attributes are present only if the source provides this information. Some image formats do not provide this information.displayWidth
,displayHeight
,sar
- unless the media source has non-square pixels, thedisplayWidth
anddisplayHeight
match thewidth
andheight
of the media, and thesar
attribute is1:1
. If the media source has non-square pixels these values are different.
The metadata
element contains any metadata that Media Server was able to extract from the source. The information present in this element varies based on the format of the source file and the information present in the source.