Assess Language Models

Before running speech-to-text you can assess whether a language pack, optionally combined with a custom language model, is suitable for processing your audio. You can check:

  • whether the words that you want to recognize are included in the vocabulary. Media Server cannot recognize words unless they are in the vocabulary. If your audio has many words that are not in the vocabulary (for example product names), you can increase accuracy by training a custom language model.
  • the estimated branching factor for the words in the speech (the perplexity). A lower value is generally better. Call center conversations typically follow a set pattern and therefore have a lower perplexity than broadcast audio.

To check whether words are present in the vocabulary

  • Obtain sample text that closely resembles the speech you want to process. Then, submit the text to the action AssessSpeechLanguageModel. Media Server returns statistics and information about unknown words.
  • Compile a list of words that you know you need to recognize and use the QuerySpeechLanguageModel action to check whether the words are present in the vocabulary.

To measure perplexity for a language model

  • Obtain sample text that closely resembles the speech that you want to process. Then, submit the text to the action AssessSpeechLanguageModel. Media Server returns a perplexity value. Perplexity values around or below 100 are acceptable for processing call center conversations. Perplexity values around or below 250 are acceptable for television news/broadcast audio. A lower perplexity value is generally better. If the AssessSpeechLangaugeModel action returns a perplexity value that is much higher, consider training a custom language model.

For more information about the AssessSpeechLanguageModel and QuerySpeechLanguageModel actions, refer to the Media Server Reference.