Speaker Clustering Results

The following XML shows a single record produced by speaker clustering.

<output>
    <record>
        <timestamp>
            ...
        </timestamp>
        <trackname>SpeakerClustering.Result</trackname>
        <ClusterSpeechResult>
            <id>8fb22011-d041-4694-a925-5bf87df5d6b7</id>
            <label>Cluster_3</label>
        </ClusterSpeechResult>
    </record>
</output>

The record contains the following information:

  • The id element provides a unique identifier for the section of audio.
  • The label element describes who was speaking.

    The first speaker is named Cluster_1, the second is named Cluster_2, and so on. If you know the maximum number of speakers in your audio, OpenText recommends setting the parameter MaxSpeakers.

    This element can also have a value of <SIL> or <s>, which indicates a period of audio without speech, such as silence or background noise. <SIL> indicates silence that probably has no linguistic role. <s> is more likely to end a chain of words, for example when a speaker begins a new sentence.