IDOL Speech Server’s approach to performing speech-to-text is motivated by an information breakdown point of view of speech. This state-of-the-art approach is used by most leading speech technologists.
This approach means that the speech-to-text engine requires a language pack that contains:
The following diagram shows the inputs and resources that the speech-to-text engine receives.
|
Models of fundamental sound patterns |
|
|
||
|
Pronunciation dictionary with vocabulary |
Base language models and customized models |
|