Deduplication identifies duplicate records in a track, and produces a new track that has the duplicate records removed.
The engine identifies two identical records that occur within a specific time interval and discards the second record. For example, face recognition can produce a record for each frame that a person is recognized in. The Deduplicate ESP engine can remove duplicate records so that the track contains a single record for each recognized person.
You can specify the conditions that make two records identical. There are several options:
use the default equivalence conditions for the track. Each type of track has its own default equivalence conditions; for example, OCR records are considered equivalent if the text is identical. The table lists the equivalence conditions for each output track.
Analysis engine | Output tracks | Equivalence conditions |
---|---|---|
Barcode | Data, DataWithSource, Result, ResultWithSource | Text field must be identical. |
Face detection | Data, DataWithSource, Result, ResultWithSource | Rectangle field must be identical. |
Face demographics | Result, ResultWithSource | All custom fields must be identical. |
Face recognition | Result, ResultWithSource | Database and identifier fields must be identical. |
Face state | Result, ResultWithSource | All custom fields must be identical. |
Numberplate | Data, DataWithSource, Result, ResultWithSource | Text field must be identical. |
PlateRegion | Polygon field must be identical. | |
Object class recognition | Result, ResultWithSource | The recognizer, classification identifier, and region must be identical. |
Object recognition | Data, DataWithSource, Result, ResultWithSource | Database and identifier fields must be identical. |
Image classification | Result, ResultWithSource | Classifier and identifier fields must be identical. |
OCR | Data, DataWithSource, Result, ResultWithSource | Text field must be identical. |
SceneAnalysis | Data, DataWithSource, Result, ResultWithSource | All custom fields must be identical. |
SpeakerID | Result | Text field must be identical. |
SpeechToText | Result | Text field must be identical. |
To deduplicate a track
Create a new configuration to send to Media Server with the process
action, or open an existing configuration that you want to modify.
In the [Session]
section, add a new task by setting the EngineN
parameter. You can give the task any name, for example:
[Session] Engine0=Ingest ...
Engine5=Deduplicate
Create a new configuration section to contain the task settings, and set the following parameters:
Type
|
The ESP engine to use. Set this parameter to deduplicate . |
Input
|
The name of the track to deduplicate. This must be an output track produced by another task. |
MinTimeInterval
|
The minimum time between records. When you process video, the engine only discards duplicate records that occur within this time interval. If you are processing images or documents this parameter is ignored. |
PredicateType
|
(Optional) The conditions to use to determine whether two records are considered identical. You can set one of:
|
LuaScript
|
(Optional) The name of a Lua script that determines whether two records are considered identical. For more information, see Write a Lua Script for an ESP Engine. |
For more information about these parameters, including the values that they accept, refer to the Media Server Reference.
Save and close the configuration file. Micro Focus recommends that you save your configuration files in the location specified by the ConfigDirectory
parameter.
Example
The following example deduplicates the output track from an OCR task by discarding all identical records that occur within 1 second after a record. The records are judged to be identical based on the default equivalence conditions for the OCR track (the text is identical).
[DeduplicateOCR] Type=deduplicate Input=myocr.data MinTimeInterval=1000ms PredicateType=default_FT_HTML5_bannerTitle.htm