When identifying the most likely speaker in a section of audio, IDOL Speech Server scores how closely the segment acoustic properties match with each of the speaker templates. For closed-set operation, the top-scoring speaker is simply taken as being the true result.
However, for open-set identification, IDOL Speech Server needs to allow for unknown speakers in the audio. It does this by estimating a score threshold for each speaker; the hit is considered valid only if a template scores above this threshold. If for any audio segment the top-scoring template falls below the threshold associated with that speaker, the segment is assumed to be an unknown speaker.
The score threshold for each speaker template is based on an analysis of the speaker match scores observed for that template against both matching speaker data (true examples), and non-matching speaker data (false examples).
The SpkIdDevelWav
task takes a single audio file, along with the name of the speaker the file is associated with, and generates score statistics for one or more speaker templates. These statistics are then stored in an audio template development file (.atd
).
Note: If the speaker is not in the set, you can set the speaker name to “Unknown”.
You must run the SpkIdDevelWav
task once for each audio file to be used in threshold estimation. You can choose to append the scores for each audio file to a single .atd
file (the default method), or to create a separate development file for each audio file. You can use one or more development files when estimating the threshold for each speaker template.
To append the scores to a common development file, you must ensure that the file does not exist before you run the first SpkIdDevelWavtask
task.
The creation of an individual development file for each audio file ensures that the statistics do not get inadvertently appended to a file that already existed before running the first SpkIdDevelWav
task (for example, from a previous IDOL Speech Server installation). You must specify a unique name for the development file each time that you run the task, to avoid overwriting files.
You can specify the method to use by using the DevAppend
configuration parameter in the task's AudioModelDevel
module, which you can set by using the Append
parameter on the command line. For more information about this parameter, see the IDOL Speech Server Reference.
To generate a development score file
Gather together the required audio files for testing, including:
At least one file for each speaker that contains speech from that speaker only; aim to use a minimum of five minutes of speech for each speaker.
Note: Do not use the same audio that you used to create the speaker templates.
Files that contain unknown speakers (those not in the training set).
Note: It is important to use a substantial amount of unknown speaker data, from a wide range of speakers, to correctly tune the thresholds.
Create a list of the speaker templates. Each list entry must include the name of the speaker, and the name of their template file. Use the format:
speakerLabel
;templateFile
For example:
Brown;brown.atf Jones;jones.atf Smith;smith.atf
For more information about IDOL Speech Server's list manager, see Create and Manage Lists.
For each audio file, send an AddTask
action to IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to SpkIdDevelWav . |
File
|
The audio file that contains the speaker example speech. |
DataLabel
|
The name of the speaker that the audio is associated with. |
TemplateList
|
The list file that specifies the set of speaker templates to use. |
DevFile
|
The development file (.atd ) to create or update. |
For example:
http://localhost:15000/action=AddTask&Type=SpkIdDevelWav&File=C:/Data/BrownSpeech4.wav&DataLabel=Brown&TemplateList=ListManager/speakers&DevFile=speakers.atd
This action uses port 15000
to instruct IDOL Speech Server, which is located on the local machine, to generate match statistics based on audio from the speaker named Brown
, in the audio file BrownSpeech4.wav
, against all of the speaker templates specified in the speakers list. The results are written to the speakers.atd
development file.
You can set additional parameters. For details of the optional parameters, see the IDOL Speech Server Reference.
This action returns a token. You can use the token to:
.atd
file. To process streamed audio, use the StreamSidOptimize
task. For more details about this standard task, see the IDOL Speech Server Reference.
To estimate Speaker Template score thresholds
After you gather both positive and negative score statistics for each of the speaker templates, you can calculate the threshold associated with each speaker. This threshold is stored within the speaker template file.
You can do this for each speaker template individually, or across the whole set at once. The example given here shows the latter approach.
You can specify multiple template development files in a list file, or just a single development file. Again, the latter approach is shown here.
You can use the Bias
parameter to bias the threshold calculated towards fewer false positives (at the likely cost of more misses), or the other way around. Increase the value of the Bias
parameter to reduce false positives and increase precision, lower it to reduce misses and increase recall.
For details on other options associated with this task, see the IDOL Speech Server Reference.
To estimate the thresholds for a set of speaker templates, given a single development score file, send an AddTask
action to IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to SpkIdDevelFinal . |
|
The input template development file. |
TemplateList
|
A list file that specifies the templates to use. |
Bias
|
The bias setting to use when calculating thresholds. |
OutPath
|
The output path for the updated speaker templates. |
OutExt
|
The file extension for output speaker templates. Note: If you do not set either the |
For example:
http://localhost:15000/action=AddTask&Type=SpkIdDevelFinal&DevFile=speakers.atd&TemplateList=ListManager/speakers&Bias=0.2
This action uses port 15000
to instruct IDOL Speech Server, which is located on the local machine, to use the development scores in speakers.atd
to calculated thresholds (using a Bias
value of 0.2
when balancing recall against precision) for each speaker template specified in the speakers list. IDOL Speech Server updates the template files in place to contain the threshold values calculated.
You can set additional parameters. For details of the optional parameters, see the IDOL Speech Server Reference.
This action returns a token. You can use the token to:
To modify the threshold of a single template
You can use the SpkIdTmpEditThresh
standard task to modify the threshold of a single template by specifying the template file (.atf
).
AddTask
action to IDOL Speech Server, and set the following parameters:Type
|
The task name. Set to .
|
TemplateFile
|
The name of the template to modify. |
Thresh
|
The value to use for the threshold. |
For example:
http://localhost:15000/action=AddTask&Type=SpkIdTmpEditThresh&TemplateFile=speakers.atf&Thresh=0.5
This action uses port 15000
to instruct IDOL Speech Server, which is located on the local machine, to set the threshold of the speakers.atf
template file to 0.5
. IDOL Speech Server updates the template file in place to contain the new threshold value.
You can set additional parameters. For details of the optional parameters, see the IDOL Speech Server Reference.
This action returns a token. You can use the token to:
To retrieve information on a single template
You can use the SpkIdTmpInfo
standard task to write information on a specified template file to a log file.
AddTask
action to IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to SpkIdTmpInfo . |
TemplateFile
|
The name of the template file to retrieve information for. |
Log
|
The log file to write the information to. |
For example:
http://localhost:15000/action=AddTask&Type=SpkIdTmpInfo&TemplateFile=speakers.atf&Log=speakers.log
This action uses port 15000
to instruct IDOL Speech Server, which is located on the local machine, to write information on the speakers.atf
template file to the log file speakers.log
.
You can set additional parameters. For details of the optional parameters, see the IDOL Speech Server Reference.
This action returns a token. You can use the token to:
<TEMPLATE_0> <NAME> test.atf </NAME> <THRESH_ENABLED> Yes </THRESH_ENABLED> <THRESH_VALUE> 1.158 </THRESH_VALUE> <NCOMPS> 1023 </NCOMPS> <SHARE_ICOV> Yes </SHARE_ICOV> <SHARE_MEANS> Yes </SHARE_MEANS> <SHARE_MEANS_PERCENT> 23.4604 </SHARE_MEANS_PERCENT> </TEMPLATE>
This file shows some information about how the template was trained and optimized, along with information about the template. The log file includes the following fields:
|
The start of information on template 0, and so on. |
<NAME>
|
The name associated with the template. |
<THRESH_ENABLED>
|
Whether a score threshold is enabled for this template. |
<THRESH_VALUE>
|
The score threshold that has been estimated for this template. |
<NCOMPS>
|
The number of components used in this template. |
<SHARE_ICOV>
|
Whether this template shares variance statistics with a base template. |
<SHARE_MEANS>
|
Whether this template shares mean parameters with a base template. |
<SHARE_MEANS_PERCENT>
|
The percentage of mean parameter components shared with the base template. |
|