Transcribe Speech

To run speech-to-text

  1. Create a new configuration to send to Media Server with the process action, or open an existing configuration that you want to modify.

  2. In the [Session] section, add a new analysis task by setting the EngineN parameter. You can give the task any name, for example:

    [Session]
    Engine0=Ingest
    Engine1=TranscribeSpeech
  3. Create a new section to contain the settings for the task and set the following parameters:

    Type The analysis engine to use. Set this parameter to SpeechToText.
    Input (Optional) The audio track to analyze. If you do not specify an input track, Media Server processes the first track of the correct type produced by the ingest engine.
    LanguagePack The language pack to use. For a list of available language packs, see Speech Analysis Supported Languages.
    CustomLanguageModel (Optional) A comma-separated list of custom language models to use. For each custom language model, specify the identifier and interpolation weight, separated by a colon.
    CustomWordDatabase (Optional) The name of a custom word database to use. For more information about custom word databases, see Custom Word Databases.
    SpeedBias Specifies whether to prioritize accuracy or speed. If you are processing a live stream, you must set this parameter to Live. Otherwise, set this parameter to an integer from 1 to 6, where 1 prioritizes accuracy and 6 prioritizes speed.
    FilterMusic (Optional, default false) Specifies whether to ignore speech-to-text results for audio segments that are identified as music or noise. To filter these results from the output, set this parameter to true.
    SampleFrequency (Optional, default 16000) The sample frequency of the audio to send to the audio service for analysis, in samples per second (Hz). Language packs are dependent on the audio sample rate, and accept audio at either 8000Hz or 16000Hz.

    For example:

    [TranscribeSpeech]
    Type=SpeechToText
    LanguagePack=ENUK
    CustomLanguageModel=MedicalTerms:0.1
    SpeedBias=2
    FilterMusic=TRUE

    For more information about the parameters that you can use to configure speech-to-text, refer to the Media Server Reference.

  4. Save and close the configuration file. Micro Focus recommends that you save your configuration files in the location specified by the ConfigDirectory parameter.