Deduplicate Records in a Track

Deduplication identifies duplicate records in a track, and produces a new track that has the duplicate records removed.

The engine identifies two identical records that occur within a specific time interval and discards the second record. For example, face recognition can produce a record for each frame that a person is recognized in. The Deduplicate ESP engine can remove duplicate records so that the track contains a single record for each recognized person.

You can specify the conditions that make two records identical. There are several options:

  • any records are considered identical; Media Server discards any record that occurs within the minimum time interval of the first record
  • use the default equivalence conditions for the track. Each type of track has its own default equivalence conditions; for example, OCR records are considered equivalent if the text is identical. The table lists the equivalence conditions for each output track.

    Analysis engine Output tracks Equivalence conditions
    Barcode Data, DataWithSource, Result, ResultWithSource Text field must be identical.
    Face detection Data, DataWithSource, Result, ResultWithSource Rectangle field must be identical.
    Face demographics Result, ResultWithSource All custom fields must be identical.
    Face recognition Result, ResultWithSource Database and identifier fields must be identical.
    Face state Result, ResultWithSource All custom fields must be identical.
    Numberplate Data, DataWithSource, Result, ResultWithSource Text field must be identical.
    PlateRegion Polygon field must be identical.
    Object class recognition Result, ResultWithSource The recognizer, classification identifier, and region must be identical.
    Object recognition Data, DataWithSource, Result, ResultWithSource Database and identifier fields must be identical.
    Image classification Result, ResultWithSource Classifier and identifier fields must be identical.
    OCR Data, DataWithSource, Result, ResultWithSource Text field must be identical.
    SceneAnalysis Data, DataWithSource, Result, ResultWithSource All custom fields must be identical.
    SpeakerID Result Text field must be identical.
    SpeechToText Result Text field must be identical.
  • specify equivalence conditions using a Lua script. For example, you might want to declare two Face records identical if they contain the same person, even if the location of the face in the frame is different. For more information about Lua scripts, see Write a Lua Script for an ESP Engine.

To deduplicate a track

  1. Create a new configuration to send to Media Server with the process action, or open an existing configuration that you want to modify.

  2. In the [Session] section, add a new task by setting the EngineN parameter. You can give the task any name, for example:

    [Session]
    Engine0=Ingest
    ...
    Engine5=Deduplicate
  3. Create a new configuration section to contain the task settings, and set the following parameters:

    Type The ESP engine to use. Set this parameter to deduplicate.
    Input The name of the track to deduplicate. This must be an output track produced by another task.
    MinTimeInterval The minimum time between records. When you process video, the engine only discards duplicate records that occur within this time interval. If you are processing images or documents this parameter is ignored.
    PredicateType

    (Optional) The conditions to use to determine whether two records are considered identical. You can set one of:

    • always. Any records are considered identical.
    • default. Use the default equivalence conditions for the track type.
    • lua. Use the conditions defined in a Lua script specified in the LuaScript parameter.
    LuaScript (Optional) The name of a Lua script that determines whether two records are considered identical. For more information, see Write a Lua Script for an ESP Engine.

    For more information about these parameters, including the values that they accept, see Deduplicate Engine.

  4. Save and close the configuration file. OpenText recommends that you save your configuration files in the location specified by the ConfigDirectory parameter.

Example

The following example deduplicates the output track from an OCR task by discarding all identical records that occur within 1 second after a record. The records are judged to be identical based on the default equivalence conditions for the OCR track (the text is identical).

[DeduplicateOCR]
Type=deduplicate
Input=myocr.data
MinTimeInterval=1000ms
PredicateType=default