Control Speech-to-Text Speed

The speed of speech-to-text is affected by:

  • The model you use to process the speech - set by the configuration parameter ModelVersion.
  • Whether you are processing using a GPU.

In some cases, you might want to prioritize accuracy over processing speed (for example, if you are batch-processing a large library of audio files, and optimizing accuracy is your only goal). In other cases you might want to prioritize processing speed. When you are processing a live stream, you must configure speech-to-text so that it keeps up with the stream.

The following sections describe how to configure the speed of speech-to-text.

Process Live Streams

To process a live stream such as a television broadcast, you must set the parameter IngestRate to 1 (in the [Session] section of your session configuration file), and the parameter SpeedBias to Live (in the speech-to-text task). In this mode, Media Server maximizes accuracy while ensuring that speech-to-text processing keeps up with the live stream. There is always some audio data waiting to be processed, but not so much that the processing falls behind.

If you are processing on a CPU, set ModelVersion=micro to ensure that the speech-to-text engine can keep up with the live stream. Some hardware might be able to keep up with live using ModelVersion=small but you would need to test this on your system. If you have a compatible GPU and have configured Media Server to use it, processing is much faster and you might be able to choose any model you wish.

Example configuration

[Session]
IngestRate=1
Engine0=Ingest
Engine1=SpeechToText
...

[Ingest]
Type=Video

[SpeechToText]
Type=SpeechToText
LanguagePack=ENUK
ModelVersion=micro
SpeedBias=Live

Process Files

When you process a file, OpenText recommends that you set IngestRate=0 (in the [Session] section of your session configuration file). This allows Media Server to ingest the file as fast or as slowly as your analysis tasks require.

In your speech-to-text task, use the ModelVersion parameter to prioritize either accuracy or processing speed. OpenText recommends setting SpeedBias=6 (other values of this parameter were intended for use with the legacy speech-to-text models).

Example configuration

[Session]
IngestRate=0
Engine0=Ingest
Engine1=SpeechToText
...

[Ingest]
Type=Video

[SpeechToText]
Type=SpeechToText
LanguagePack=ENUK
ModelVersion=medium
SpeedBias=6