Transcribe Speech

To run speech-to-text

  1. Create a new configuration to send to Media Server with the process action, or open an existing configuration that you want to modify.

  2. In the [Session] section, add a new analysis task by setting the EngineN parameter. You can give the task any name, for example:

    [Session]
    Engine0=Ingest
    Engine1=TranscribeSpeech
  3. Create a new section to contain the settings for the task and set the following parameters:

    Type The analysis engine to use. Set this parameter to SpeechToText.
    Input (Optional) The audio track to analyze. If you do not specify an input track, Media Server processes the first track of the correct type produced by the ingest engine.
    LanguagePack

    The language pack to use. For a list of available language packs, see Speech Analysis Supported Languages.

    If your session configuration includes Language ID, you can instruct Media Server to use the detected language:

    • set Input to the ResultWithSource track from your Language ID task.
    • set LanguagePack=input.

    NOTE: LanguagePack=Input is not available when ModelVersion=Legacy.

    ModelVersion The model to use to convert speech into text.
    SpeedBias To process a live stream, set this parameter to Live. Otherwise, set this parameter to 6.

    For example:

    [TranscribeSpeech]
    Type=SpeechToText
    LanguagePack=ENUK
    ModelVersion=small
    SpeedBias=6
    

    For more information about the parameters that you can use to configure speech-to-text, see Speech-To-Text.

  4. Save and close the configuration file. OpenText recommends that you save your configuration files in the location specified by the ConfigDirectory parameter.