The events that occur in video usually span many frames. For example, a person, object, or logo might appear on screen and remain there for several minutes. Media Server analyzes video frame by frame, but many analysis engines track events across frames because analyzing multiple frames can improve accuracy.
Analysis tasks can produce many different output tracks but, regardless of which track they belong to, records that relate to the same event always have the same ID.
Result tracks contain records that summarize the analysis results for complete event. Each record can span many video frames and has a start time, peak time, end time, duration, and an ID. You can use the ID to find other records that are related to the same event. The purpose of a result track is to provide a summary of the analysis results that is suitable to output from Media Server. Media Server does not generate a record in a result track until an event has finished, because these records represent an entire event from beginning to end.
Example: A face detection result track contains a single record for each detected face. Each record has a different ID.
Example: A face recognition result track contains zero or more records for each detected face (there can be multiple recognition results when there are several matches that exceed the recognition threshold). Face recognition results inherit their ID from the detected face, so all of the recognition results for the same detected face have the same ID.
Data tracks contain records that correspond to a single analyzed frame. A data track can contain hundreds of records that relate to the same event. A data track can also contain multiple records that relate to the same video frame, because multiple events can occur at the same time.
Example: A face detection data track contains at least one record for every analyzed frame in which a face appears. If a person remains in the scene for several seconds, this track could contain hundreds of records that identify the same face and have the same ID. If a video frame contains three faces, the face detection data track will contain three records with timestamps matching that frame, each with a different ID.
Data
and DataWithSource
tracks contain a lot of information, usually more than you want to output from Media Server. These tracks are intended to provide data for subsequent analysis tasks. For example, you can use the DataWithSource
track from face detection as the input for face recognition, so that face recognition can analyze each face across multiple video frames.
Start and End tracks contain records that describe the beginning or end of an event in the video.
Example: With face detection the start track contains a record when a face appears in the scene, and the end track contains a record when the face disappears.
Example: Face recognition does not produce a start or end track, because information about events (detected faces) is provided by face detection.
SegmentDuration
. When a record reaches the maximum duration, Media Server outputs the record and begins a new one with the same ID. This means that for every record in the result track that exceeds the maximum duration, there will be two or more records in the SegmentedResult
track. Segmented results are useful when you need to obtain information about an event before it finishes.SegmentedResult
tracks. The records are the same, except that each record also includes the best source frame that was available at the time the record was generated.The following diagram shows how face detection creates records (represented by rectangles) when a face appears in a video.
The following diagram shows how face detection creates records (represented by rectangles) when two faces appear in a video. All of the records related to the same detected face (the same event) have the same ID. So, in the following example, all of the blue records (1) would have the same ID and all of the green records (2) would have the same ID.
In both of the previous examples:
Result
and ResultWithSource
tracks for each event (in this example a detected face). These records span the event and summarize the analysis results. When there are multiple people in the scene at the same time, the records overlap chronologically.Data
and DataWithSource
tracks correspond to a single analyzed frame. This means that there can be many records for each event. When there are multiple people in the scene, there are multiple records with timestamps matching the same video frame.Start
track when a person appears in the scene.End
track when a person leaves the scene.SegmentDuration
, so Media Server creates multiple records in the SegmentedResult
and SegmentedResultWithSource
tracks. Media Server starts a new record when the SegmentDuration
is reached.Some analysis tasks process the output of other engines. Face recognition, for example, processes records that are produced by face detection. You can see from the examples, above, that the face detection DataWithSource
track provides much more information than the ResultWithSource
track. When you configure face recognition, you can choose which track to process. Processing the DataWithSource
track can result in better accuracy, because face recognition processes multiple video frames for each detected face. However, processing all of these frames is more computationally intensive and you should configure this only if your server has sufficient resources.
For information about the tracks that are produced by Media Server tasks, and the information contained in each track, refer to the Media Server Reference.
|