You can set additional parameters in the tasks configuration file to improve the performance of speech-to-text.
If the audio data contains a lot of background noise or foreground music, you can enable speech detection to improve speech-to-text rates:
[frontend]
module used by the speech-to-text task, set the DetectSpeech
parameter to True
to modify how the speech-to-text engine processes audio sections that are labeled as speech, which can improve recognition in these sections.[normalizer]
module used by the speech-to-text task, set ZeroSilFrames
to True
. The speech-to-text engine skips over sections of audio that are identified as silence.If many of the words in the audio do not appear in the transcript, the language model might be too strongly weighted. In the language pack section of the configuration file, experiment with the following parameters:
LmScale
parameter (the recommended range is between 0.2
and 2.0
). LmOffset
parameter (the recommended range is between -0.5
and +0.5
).If the speech-to-text is producing many more words in the transcript file than are spoken in the audio, the language model might be too weakly weighted. In the language pack section of the tasks configuration file, experiment with the following parameters:
LmScale
parameter (the recommended range is between 0.2
and 2.0
). LmOffset
parameter (the recommended range is between -0.5
and +0.5
.You can also tune the following speech-to-text parameters to improve general speech-to-text performance :
Mode
ModeValue
For more information about these parameters, see the HPE IDOL Speech Server Reference.
|