Custom Language Models
Speech-to-text is supported out-of-the-box and does not require training, but you can often improve accuracy by creating a custom language model. A language model is just one part of the training for a language. It describes the vocabulary and contains information about how sentences are composed from individual words. This section explains when you might want to use a custom language model and how to build one.
Using a custom language model can improve accuracy when:
- The audio contains specialized vocabulary. Any language model has a finite vocabulary and your audio might contain words that are not in the standard vocabulary. For example, a recording of a lecture to medical professionals might include specialized medical terminology, and a recording of a telephone call to a customer support center might include product names.
- You have access to many transcripts that are representative of typical conversations, such as call center conversations.
Building a language model requires a lot of text - millions or billions of words. The language models supplied with Media Server are trained with billions of words across a wide range of topics. This is a significant training burden, but you can build a small, focused, language model and use it to supplement one of the standard models.
Standard language model | Custom language model |
---|---|
|
|