The WavSidOptimize
task is deprecated for HPE IDOL Server version 11.3. Use the SpkIdDevelWav
task instead.
This task is still available for existing implementations, but it might be incompatible with new functionality. The task might be deleted in future.
The WavSidOptimize
task generates statistics that are used for determining speaker template match thresholds. These values are based on analyzing the speaker match scores observed for each template against both matching speaker data (leading to true positives), and non-matching speaker data (leading to false positives).
You build up the statistics by presenting HPE IDOL Speech Server with audio files labeled as being from one of the known speakers or an unknown speaker. The WavSidOptimize
task generates these statistics and stores them in a Speaker ID Optimization (.spo) file.
You must run the WavSidOptimize
task once for each audio file. You can choose to append the scores for each audio file to a single .spo file (the default method), or to create a separate .spo file for each audio file and combine these at the packaging stage.
WavSidOptimize
task. WavSidOptimize
task (for example, from a previous HPE IDOL Speech Server installation). It also allows you to easily remove individual .spo files from the set based on information in the diagnostics files. You must specify a unique name for the .spo file each time you run the task, to avoid overwriting a file.Parameter | Description | Required |
---|---|---|
Type | The task name. Set to WavSidOptimize . |
Yes |
Ast | The speaker classifier file. | See Comments. |
CompSelect | The components to use for scoring. | |
Diag | Whether to generate diagnostic information. | |
DiagFile | The file to write the diagnostic information to. | |
DiscardShort | Exclude segments shorter than a specific duration from further analysis. | |
File | The audio file that contains the speaker sample speech. | Yes |
MinNonSpeech | The minimum size in seconds of non-speech segments. | |
MinSpeech | The minimum size in seconds of speech segments. | |
Sfreq | The sample frequency of the audio file to process. | |
SidBase | The sid base pack resource to use to determine the base files to use. | |
Sig | The .sig file to use for speaker identification. | |
SpeakerName | The speaker label for the speaker in the audio. For unknown speakers, set to Unknown_ . |
Yes |
SpkList | A list of speaker templates. | Yes |
SpkPath | The path to the directory containing the speaker templates. | |
SpkSegCoef | Applies a weight to bias the decision about where speaker boundaries occur. | |
Spo | The .spo file to create or update. | Yes |
SpoAppend | Whether to append match data scores to a common .spo file. | |
SugdInputChannels | The channel layout of the input media file. | |
SugdInputFrequency | The sampling rate of the input media file. | |
USM | The USM to use. | |
USMEnabled | Whether to use the USM for optimization. |
http://localhost:13000/action=AddTask&Type=WavSidOptimize&File=C:/Data/BobSpeech.wav&SpkList=ListManager/speakers&SpkPath=C:\training&Spo=speakers.spo&SpeakerName=/ENUK/Bob
This action uses port 13000
to instruct HPE IDOL Speech Server, which is located on the local machine, to generate match statistics for the speaker /ENUK/Bob
by checking the example speech in BobSpeech.wav
against the speaker templates specified in the speakers
list and writing the results to the speakers.spo
file.
If you do not specify the Ast parameter, the action uses the base ast file, determined by the SidBase resource. This base file does not contain any speaker information, and cannot identify speakers, but it performs gender detection and speaker segmentation.
|