WavSidOptimize

Deprecated: The WavSidOptimize task is deprecated for IDOL Server version 11.0.0. Use the SpkIdDevelWav task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The task might be deleted in future.

The WavSidOptimize task generates statistics that are used for determining speaker template match thresholds. These values are based on analyzing the speaker match scores observed for each template against both matching speaker data (leading to true positives), and non-matching speaker data (leading to false positives).

You build up the statistics by presenting IDOL Speech Server with audio files labeled as being from one of the known speakers or an unknown speaker. The WavSidOptimize task generates these statistics and stores them in a Speaker ID Optimization (.spo) file.

You must run the WavSidOptimize task once for each audio file. You can choose to append the scores for each audio file to a single .spo file (the default method), or to create a separate .spo file for each audio file and combine these at the packaging stage.

Parameters

Parameter Description Required
Type The task name. Set to WavSidOptimize. Yes
Ast The speaker classifier file. See Comments.
CompSelect The components to use for scoring.  
Diag Whether to generate diagnostic information.  
DiagFile The file to write the diagnostic information to.  
DiscardShort Exclude segments shorter than a specific duration from further analysis.  
File The audio file that contains the speaker sample speech. Yes
MinNonSpeech The minimum size in seconds of non-speech segments.  
MinSpeech The minimum size in seconds of speech segments.  
Sfreq The sample frequency of the audio file to process.  
SidBase The sid base pack resource to use to determine the base files to use.  
Sig The .sig file to use for speaker identification.  
SpeakerName The speaker label for the speaker in the audio. For unknown speakers, set to Unknown_. Yes
SpkList A list of speaker templates. Yes
SpkPath The path to the directory containing the speaker templates.  
SpkSegCoef Applies a weight to bias the decision about where speaker boundaries occur.  
Spo The .spo file to create or update. Yes
SpoAppend Whether to append match data scores to a common .spo file.  
SugdInputChannels The channel layout of the input media file.  
SugdInputFrequency The sampling rate of the input media file.  
USM The USM to use.  
USMEnabled Whether to use the USM for optimization.  

Example

http://localhost:13000/action=AddTask&Type=WavSidOptimize&File=C:/Data/BobSpeech.wav&SpkList=ListManager/speakers&SpkPath=C:\training&Spo=speakers.spo&SpeakerName=/ENUK/Bob

This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to generate match statistics for the speaker /ENUK/Bob by checking the example speech in BobSpeech.wav against the speaker templates specified in the speakers list and writing the results to the speakers.spo file.

Comments

If you do not specify the Ast parameter, the action uses the base ast file, determined by the SidBase resource. This base file does not contain any speaker information, and cannot identify speakers, but it performs gender detection and speaker segmentation.


_HP_HTML5_bannerTitle.htm