Estimate Speaker Score Thresholds

When identifying the most likely speaker in a section of audio, IDOL Speech Server scores how closely the segment acoustic properties match with each of the speaker templates. For closed-set operation, the top-scoring speaker is simply taken as being the true result.

However, for open-set identification, IDOL Speech Server needs to allow for unknown speakers in the audio. It does this by estimating a score threshold for each speaker; the hit is considered valid only if a template scores above this threshold. If for any audio segment the top-scoring template falls below the threshold associated with that speaker, the segment is assumed to be an unknown speaker.

The score threshold for each speaker template is based on an analysis of the speaker match scores observed for that template against both matching speaker data (true examples), and non-matching speaker data (false examples).

The SpkIdDevelWav task takes a single audio file, along with the name of the speaker the file is associated with, and generates score statistics for one or more speaker templates. These statistics are then stored in an audio template development file (.atd).

Note: If the speaker is not in the set, you can set the speaker name to “Unknown”.

You must run the SpkIdDevelWav task once for each audio file to be used in threshold estimation. You can choose to append the scores for each audio file to a single .atd file (the default method), or to create a separate development file for each audio file. You can use one or more development files when estimating the threshold for each speaker template.

To append the scores to a common development file, you must ensure that the file does not exist before you run the first SpkIdDevelWavtask task.

The creation of an individual development file for each audio file ensures that the statistics do not get inadvertently appended to a file that already existed before running the first SpkIdDevelWav task (for example, from a previous IDOL Speech Server installation). You must specify a unique name for the development file each time that you run the task, to avoid overwriting files.

You can specify the method to use by using the DevAppend configuration parameter in the task's AudioModelDevel module, which you can set by using the Append parameter on the command line. For more information about this parameter, see the IDOL Speech Server Reference.

To generate a development score file

  1. Gather together the required audio files for testing, including:

  2. Create a list of the speaker templates. Each list entry must include the name of the speaker, and the name of their template file. Use the format:

    speakerLabel;templateFile

    For example:

    Brown;brown.atf
    Jones;jones.atf
    Smith;smith.atf
    

    For more information about IDOL Speech Server's list manager, see Create and Manage Lists.

  3. For each audio file, send an AddTask action to IDOL Speech Server, and set the following parameters:

    Type The task name. Set to SpkIdDevelWav.
    File

    The audio file that contains the speaker example speech.

    DataLabel The name of the speaker that the audio is associated with.
    TemplateList The list file that specifies the set of speaker templates to use.
    DevFile The development file (.atd) to create or update.

    For example:

    http://localhost:15000/action=AddTask&Type=SpkIdDevelWav&File=C:/Data/BrownSpeech4.wav&DataLabel=Brown&TemplateList=ListManager/speakers&DevFile=speakers.atd

    This action uses port 15000 to instruct IDOL Speech Server, which is located on the local machine, to generate match statistics based on audio from the speaker named Brown, in the audio file BrownSpeech4.wav, against all of the speaker templates specified in the speakers list. The results are written to the speakers.atd development file.

    You can set additional parameters. For details of the optional parameters, see the IDOL Speech Server Reference.

    This action returns a token. You can use the token to:

    To process streamed audio, use the StreamSidOptimize task. For more details about this standard task, see the IDOL Speech Server Reference.

    To estimate Speaker Template score thresholds

    After you gather both positive and negative score statistics for each of the speaker templates, you can calculate the threshold associated with each speaker. This threshold is stored within the speaker template file.

    You can do this for each speaker template individually, or across the whole set at once. The example given here shows the latter approach.

    You can specify multiple template development files in a list file, or just a single development file. Again, the latter approach is shown here.

    You can use the Bias parameter to bias the threshold calculated towards fewer false positives (at the likely cost of more misses), or the other way around. Increase the value of the Bias parameter to reduce false positives and increase precision, lower it to reduce misses and increase recall.

    For details on other options associated with this task, see the IDOL Speech Server Reference.

For example:

http://localhost:15000/action=AddTask&Type=SpkIdDevelFinal&DevFile=speakers.atd&TemplateList=ListManager/speakers&Bias=0.2

This action uses port 15000 to instruct IDOL Speech Server, which is located on the local machine, to use the development scores in speakers.atd to calculated thresholds (using a Bias value of 0.2 when balancing recall against precision) for each speaker template specified in the speakers list. IDOL Speech Server updates the template files in place to contain the threshold values calculated.

You can set additional parameters. For details of the optional parameters, see the IDOL Speech Server Reference.

This action returns a token. You can use the token to:


_HP_HTML5_bannerTitle.htm