Identify the Language of Speech

To identify the language of speech

  1. Create a new configuration to send to Media Server with the process action, or open an existing configuration that you want to modify.

  2. In the [Session] section, add a new analysis task by setting the EngineN parameter. You can give the task any name, for example:

    [Session]
    Engine0=Ingest
    Engine1=SpeechLanguageId
  3. Create a new section to contain the settings for the task, and set the following parameters:

    Type The analysis engine to use. Set this parameter to LanguageID.
    Input (Optional) The audio track to process. If you do not specify an input track, Media Server processes the first audio track produced by the ingest engine.
    Languages (Optional) The list of languages to consider when running language identification. If you know which languages are likely to be present in the media, OpenText recommends setting this parameter because restricting the possible languages can increase accuracy and improve performance. For a list of supported languages with language codes, see Speech Analysis Supported Languages.
    ClosedSet When you set this parameter to FALSE, Media Server can decide that the language is unknown. For more information, see the Introduction.
    Mode

    The type of language identification task to run. Set this parameter to one of the following options:

    • Boundary - Language identification seeks to determine boundaries in the audio where the language changes, and returns results for the time between boundaries.
    • Segmented - The audio is divided into fixed-size segments and Media Server does not consider previous segments when running analysis. You can use this mode to determine the language if there are multiple languages present in the audio, but this mode does not identify the exact boundary points at which the language changes. Media Server outputs a record in the SegmentedResult track for each analyzed segment, and one or more records in the Result track (Media Server starts a new record if the detected language changes).
    • Cumulative - The audio is divided into fixed-size segments and every result is based on analysis of the current segment and all of the previous segments. You might use this mode if you are processing a video file and expect the audio to contain only one language or you want to identify the primary language that is spoken. Media Server outputs a record in the SegmentedResult track for each analyzed segment, and one record in the Result track when analysis has finished.

      NOTE: Cumulative mode is not suitable for analyzing continuous streams.

    SegmentDuration (Optional, default 30s) The amount of audio to analyze as a single segment. This parameter is ignored in boundary mode.

    For example:

    [SpeechLanguageId]
    Type=LanguageID
    Languages=ENUK,DEDE
    Mode=Cumulative
    SegmentDuration=30s
  4. Save and close the configuration file. OpenText recommends that you save your configuration files in the location specified by the ConfigDirectory parameter.