Select Text for Training
To create an effective custom language model, you need sample text that strongly resembles the speech you want to process. For example, if you use speech-to-text for news monitoring, you could train a language model using recent news articles gathered from the web.
To process a recording of a lecture to medical professionals, you could source text from:
- Transcripts of similar lectures.
- Articles written by the speaker who delivered the lecture.
- Slides used in delivering the lecture.
- Any other document related to the topic or event.
- A web article that discusses the particular topic.
Other useful sources of text include:
- Transcripts of call conversations.
- Literature that describes a product or company.
- Company websites that contain natural language descriptions (rather than images or advertisements).
The amount of text required to build a custom language model can vary from a few thousand words to several hundred thousand words, depending on the topic. Generally, using more text to build the custom language model increases accuracy. However, the gains in accuracy reduce after you exceed a certain number of words (approximately 100,000 words for a typical topic such as technical support).
The training text does not need to cover all the vocabulary in the audio. The custom language model is combined with a standard language model, so that the coverage is the sum of the two models. Therefore, building a custom language model using small quantities of text still provides benefits.