The audio preprocessing module allows you to categorize audio and analyze its quality before you use it in tasks. This module has options to perform clipping detection, Signal-to-Noise Ratio (SNR) calculation, and Dual-Tone Multi-Frequency (DTMF) dial tone identification.
Signal-to-Noise Ratio Calculation
Configure Audio Preprocessing Tasks
The 11.3 release of HPE IDOL Speech Server uses an implementation of audio preprocessing based on DNN technolgy, which means that you do not need to tailor thresholds to specific audio types. The new implementation uses normalized feature vector input rather than audio samples, which requires updates to the task schemas.
Note that for tasks that combine audio preprocessing with speech-to-text, you must you must include separate frontend
and normalizer
calls for both audio preprocessing and speech-to-text, because the form of the frontend
feature vectors needed for the two tasks might be different. The standard HPE IDOL Speech Server tasks configuration file(speechserver-tasks.cfg
) includes several examples.
For more information on working with the new algorithm, see the HPE IDOL Speech Server Reference. All tasks in the speechserver-tasks.cfg
file use the new algorithm, but the old algorithm is still supported for backwards compatibility, and you can use it in exactly the same way as before.
|