Custom Language Models

Speech-to-text is supported out-of-the-box and does not require training, but you can often improve accuracy by creating a custom language model. A language model is just one part of the training for a language. It describes the vocabulary and contains information about how sentences are composed from individual words. This section explains when you might want to use a custom language model and how to build one.

Using a custom language model can improve accuracy when:

  • The audio contains specialized vocabulary. Any language model has a finite vocabulary and your audio might contain words that are not in the standard vocabulary. For example, a recording of a lecture to medical professionals might include specialized medical terminology, and a recording of a telephone call to a customer support center might include product names.
  • You have access to many transcripts that are representative of typical conversations, such as call center conversations.

Building a language model requires a lot of text - millions or billions of words. The language models supplied with Media Server are trained with billions of words across a wide range of topics. This is a significant training burden, but you can build a small, focused, language model and use it to supplement one of the standard models.

Standard language model Custom language model
  • Available out of the box.
  • Trained on billions of words.
  • Covers a wide range of topics.
  • Might not cover specialist vocabulary.
  • You need to create it yourself.
  • Generally trained on a smaller number of words. The custom language model is combined with a standard language model, to increase the coverage.
  • Focuses on particular topics, to increase accuracy.
  • Can cover specialist vocabulary, such as technical terms or product names.