WavSpeakerId

Deprecated: The WavSpeakerId task is deprecated for IDOL Server version 11.1.0. Use the SpkIdEvalWav task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The task might be deleted in future.

The WavSpeakerId task segments an audio file by speaker and identifies the speaker in each segment. If the speech does not match a speaker within the classifier file, IDOL Speech Server identifies it as from an unknown speaker. IDOL Speech Server also identifies periods of non-speech within the audio file.

To run the WavSpeakerID task, speakers must be trained to IDOL Speech Server.

Parameters

Parameter Description Required
Type The task name. Set to WavSpeakerId. Yes
Ast The speaker classifier file. Yes. See Comments.
CompSelect The number of components to select for use in scoring.  
Diag Whether to generate diagnostic information.  
DiagFile The file to write the diagnostic information to.  
DiscardShort Exclude segments shorter than a specific duration from further analysis.  
EndTime The end of an audio section to process.  
File The audio file to analyze. Yes
MinNonSpeech The minimum size in seconds of non-speech segments.  
MinSpeech The minimum size in seconds of speech segments.  
Out The file to write the speaker identification results to. Yes
Sfreq The sample frequency of the audio file to process  
SidBase The sid base pack resource to use to determine the base files to use.  
Sig The .sig file to use for speaker identification.  
SpkSegCoef Applies a weight to bias the decision about where speaker boundaries occur.  
SpkThreshCoef A fixed value to use to adjust the speaker identification threshold, to trade off false acceptances against rejections.  
StartTime The beginning of an audio section to process.  
SugdInputChannels The channel layout of the input media file.  
SugdInputFrequency The sampling rate of the input media file.  
USMEnabled Whether to use the USM for speaker identification.  

Example

http://localhost:13000/action=AddTask&Type=WavSpeakerId&File=C:\Data\Speech.wav&Ast=C:\training\speakers.ast&Out=SpeakID.ctm

This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to search the Speech.wav file for speakers contained in the speakers.ast classifier file and to write the identification results to the SpeakID.ctm file.

Comments

If you do not specify the Ast parameter, the action uses the base ast file, determined by the SidBase resource. This base file does not contain any speaker information, and cannot identify speakers, but it performs gender detection and speaker segmentation.


_HP_HTML5_bannerTitle.htm