Pre-Load Language Resources

Media Server automatically loads any resources that are needed to run speech-to-text, but processing cannot start until the resources have finished loading and, due to the amount of data, this might be 15 or 20 seconds after the process action is received.

You can load language resources before you send a process action, so that the resources required by the session configuration are ready and processing can begin immediately. This is particularly beneficial when you process live streams, to avoid missing the start of a broadcast.

Any language resources that you load remain in memory until you unload them, so pre-loading resources can sometimes help to increase throughput when you process many audio or video files with the same configuration.

NOTE: A language resource is a combination of a speech-to-text model and a language pack. For example, the following would all be loaded as separate language resources:

  • The ENUK language pack with the small speech-to-text model.
  • The ENUK language pack with the large speech-to-text model.
  • The ENUS language pack with the small speech-to-text model.

A resource that loads the legacy speech-to-text model also supports zero or more custom language models with specific interpolation weights, and an optional custom word database. Using different language models, interpolation weights, or custom word databases results in a separate resource being loaded. For example, the following would all be loaded as separate language resources:

  • The ENUK language pack (British English, for audio with a 16kHz sample rate) without a custom language model.
  • The ENUK language pack (British English, for audio with a 16kHz sample rate) combined with a custom language model named MedicalTerms with an interpolation weight of 0.1.
  • The ENUK language pack (British English, for audio with a 16kHz sample rate) combined with a custom language model named MedicalTerms with an interpolation weight of 0.2.
  • The ENUK language pack (British English, for audio with a 16kHz sample rate) with a custom word database.

OpenText recommends that you pre-load the language resources that you expect to use frequently, and allow Media Server to load other language resources on demand.

To load language resources, use the action LoadSpeechLanguageResource. Any language resources you load with this action remain in memory until you unload them or until Media Server is stopped.

To load language resources automatically when Media Server starts, set the SpeechLanguageResources parameter in the [PersistentData] section of the Media Server configuration file.

You can list the language resources that you have loaded by running the action ListSpeechLanguageResources, and unload them by running the action UnloadSpeechLanguageResource.