The StreamSidOptimize
task is deprecated for HPE IDOL Server version 11.3. Use the SpkIdDevelStream
and SpkIdDevelFinal
tasks instead.
This task is still available for existing implementations, but it might be incompatible with new functionality. The task might be deleted in future.
The StreamSidOptimize
task generates statistics that are used for determining speaker template match thresholds. It is a version of the WavSidOptimize task that reads in audio from a binary stream.
The statistics are based on analyzing the speaker match scores observed for each template against both matching speaker data (leading to true positives), and non-matching speaker data (leading to false positives).
You build up the statistics by presenting HPE IDOL Speech Server with audio labeled as being from one of the known speakers or an unknown speaker. The StreamSidOptimize
task generates these statistics and stores them in a Speaker ID Optimization (.spo) file.
You must run the StreamSidOptimize
task once for each audio stream. You can choose to append the scores for each audio stream to a single .spo file (the default method), or to create a separate .spo file for each audio stream and combine these at the packaging stage.
StreamSidOptimize
task. StreamSidOptimize
task (for example, from a previous HPE IDOL Speech Server installation). It also allows you to easily remove individual .spo files from the set based on information in the diagnostics files. You must specify a unique name for the .spo file each time you run the task, to avoid overwriting a file.Parameter | Description | Required |
---|---|---|
Type | The task name. Set to StreamSidOptimize . |
Yes |
Ast | The speaker classifier file. | See Comments. |
CompSelect | The components to use for scoring. | |
Diag | Whether to generate diagnostic information. | |
DiagFile | The file to write the diagnostic information to. | |
DiscardShort | Exclude segments shorter than a specific duration from further analysis. | |
MinNonSpeech | The minimum size in seconds of non-speech segments. | |
MinSpeech | The minimum size in seconds of speech segments. | |
Sfreq | The sample frequency of the audio file to process. | |
SidBase | The sid base pack resource to use to determine the base files to use. | |
Sig | The .sig file to use for speaker identification. | |
SpeakerName | The speaker label for the speaker in the audio. For unknown speakers, set to Unknown_ . |
Yes |
SpkList | A list of speaker templates. | Yes |
SpkPath | The path to the directory containing the speaker templates. | |
SpkSegCoef | Applies a weight to bias the decision about where speaker boundaries occur. | |
Spo | The .spo file to create or update. | Yes |
SpoAppend | Whether to append match data scores to a common .spo file. | |
USM | The USM to use. | |
USMEnabled | Whether to use the USM for optimization. |
http://localhost:13000/action=AddTask&Type=StreamSidOptimize&SpkList=ListManager/speakers&SpkPath=C:\training&Spo=speakers.spo&SpeakerName=/ENUK/Bob
This action uses port 13000
to instruct HPE IDOL Speech Server, which is located on the local machine, to generate match statistics for the speaker /ENUK/Bob
by checking the sample speech in the audio stream against the speaker templates specified in the speakers
list and writing the results to the speakers.spo
file.
If you do not specify the Ast parameter, the action uses the base ast file, determined by the SidBase resource. This base file does not contain any speaker information, and cannot identify speakers, but it performs gender detection and speaker segmentation.
|