Train a Custom Language Model

To create a new custom language model, use the action TrainCustomSpeechLanguageModel.

To train a custom language model

  • Train a custom language model using the TrainCustomSpeechLanguageModel action. Set the following parameters:

    identifier (Optional) A unique identifier for the custom language model (maximum 254 bytes). If you do not set this parameter, Media Server generates an identifier automatically.
    languagepack The language to base the custom language model on. You cannot train a custom language model with one language and use it with another, so this parameter and the LanguagePack parameter in your speech-to-text task must have the same value.
    textdata (Set this or textpath, but not both). The text to use for training. Text files must be uploaded as multipart/form-data. For more information about sending data to Media Server, see Send Data by Using a POST Method.
    textpath (Set this or textdata, but not both). A comma-separated list of paths to the files that contain the text to use for training. The paths must be absolute, or relative to the Media Server executable file.

    For example:

    curl http://localhost:14000 -F action=TrainCustomSpeechLanguageModel
                                -F languagepack=ENUK
                                -F textdata=@sample1.txt,sample2.txt

    Alternatively, the following example provides the paths of the text files, rather than sending the data:

    curl http://localhost:14000 -F action=TrainCustomSpeechLanguageModel
                                -F languagepack=ENUK
                                -F textpath=data/sample1.txt,data/sample2.txt

You can list the custom language models that you have trained using the action ListCustomSpeechLanguageModels. For more information about the actions that you can use for training custom language models, refer to the Media Server Reference.