Perform Visual Clustering

To cluster your items, you must first add each item to the Media Server training database. The following diagram shows how Media Server stores information you provide during training.

The "Media Server database" represents the Media Server datastore file or a database on an external database server. For information on setting up this database, see Set up a Training Database.

You can create visual clustering databases to organize your items. When you run visual clustering, Media Server compares all of the items within a single visual clustering database, and divides those items into clusters.

Media Server does not store a complete video in its database; it takes samples of the video content and uses them to generate the training that is necessary for clustering. By default, the samples are stored in the database, so that the training can be regenerated if the visual clustering algorithm is improved in a future version. If you prefer, Media Server can discard the samples after training. This results in a smaller database, but if the visual clustering algorithm is enhanced in future you would need to provide the source media again.

To perform visual clustering

  1. Create a visual clustering database by running the action CreateVisualClusteringDatabase.

    curl http://localhost:14000/action=CreateVisualClusteringDatabase -F database=BroadcastClips -F numsamples=20

    The action parameter numsamples specifies the number of samples to take from videos. Media Server uses these samples to generate the training necessary for clustering. Media Server always takes the same number of samples from each video, regardless of the video duration. This number cannot be changed after creating the database. Using a greater number of samples might result in better accuracy if your videos contain more than one main theme, but training will take longer. OpenText recommends starting with the default value of 20.

  2. Add each item to the database by running the action TrainVisualClusteringItem.

    curl http://localhost:14000/action=TrainVisualClusteringItem -F database=BroadcastClips -F identifier= item1 -F sourcedata=@video1.mp4

    If you want to discard the samples taken from videos, set the action parameter nullsourcedata=true.

  3. Run the action ClusterVisualItems.

    curl http://localhost:14000/action=ClusterVisualItems -F database=BroadcastClips

    This action is asynchronous so Media Server always returns success accompanied by a token.

  4. Use the QueueInfo action to obtain the results:

    curl http://localhost:14000/action=QueueInfo -F queueaction=GetStatus -F queuename=ClusterVisualItems -F token=(from previous step)

You can run ClusterVisualItems as many times as necessary. For example, you could add more items to your database and then run the action again.