The ClusterSpeech
task clusters wide-band speech into speaker segments. For example, if two speaker clusters are identified, the output labels are Cluster_0
and Cluster_1
respectively.
Parameter | Description | Required |
---|---|---|
Type | The task name. Set to ClusterSpeech . |
Yes |
AppDnnBase | The location of the appResources directory, which contains the DNN and .ian files to use. |
|
File | The input audio file. | |
FixTime | A fixed size for speaker clusters. | |
FrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
Lang | The name of a language pack. | Yes |
MaxNumSpeakers | The final maximum number of speakers to produce. | |
MergeThresh | The threshold below which to merge clusters. | |
MinNumSpeakers | The final minimum number of speakers to produce. | |
Out | The file that HPE IDOL Speech Server writes task output to. | |
SilThresh | The threshold between what the task identifies as silence and non-silence. | |
SpeechThresh | The threshold between speech and non-speech (music or noise). | |
SugdInputChannels | The channel layout of the input media file. | |
SugdInputFrequency | The sampling rate of the input media file. |
http://localhost:15000/a=AddTask&Type=ClusterSpeech&File=wide.wav&lang=ENUK&out=outWide
This action uses port 15000
to instruct HPE IDOL Speech Server, which is located on the local machine, to cluster the data in the wide.wav
wide-band audio file into speaker segments, and to write the results to the outWide
output file.
|