Use Your Content > Improve > IDOL Speech-to-Text Language Modelling Tutorial

IDOL Speech-to-Text Language Modelling Tutorial

This section takes you through the process of using IDOL Speech Server to perform language modelling to optimize speech-to-text performance. The aim of the tutorial is to provide you with knowledge of the IDOL Speech Server custom language modelling capabilities, so that you know what to expect in your IDOL Speech Server installation.

TIP:

You might want to complete theIDOL Speech-to-Text Tutorial before this tutorial.

For more details, refer to the IDOL Speech Server Administration Guide and the IDOL Speech Server Reference.

Supporting Files

To obtain the files that you need for this tutorial, go to the to the SoftSound Server area on the HPE Big Data Customer Support Site at https://customers.autonomy.com, click ZIP in the Format section, and then download IDOLSpeechLanguageModellingTutorial.zip.

The .zip file contains the following files:

Unzip the contents of the .zip file to the Speech Server data directory.

Requirements

To use this tutorial, you need the following components:

NOTE:

You can perform the tutorial with a later language pack, although you may find slight differences between your output and the output files provided. In addition, there might be slight differences depending on the operating system that you are using.

For more details on how to install and run IDOL Speech Server, refer to the IDOL Speech Server Administration Guide.

You must configure administrative access for your local machine by specifying the Access-Control-Allow-Origin parameter in the IDOL configuration file. For example: Access-Control-Allow-Origin=http://localhost:15000. For more information, refer to the IDOL Server Reference.

After you install IDOL Speech Server, type http://localhost:15000/action=Admin in a Web browser to start IDOL Admin and check that IDOL Speech Server is running. You can also use the IDOL Admin interface to check that the language pack is installed and configured correctly, and to load the language pack.

NOTE:

This tutorial assumes that you have completed the IDOL Speech-to-Text Tutorial or are familiar with running speech-to-text. It does not describe how to perform speech-to-text processing with IDOL Speech Server.

Language Model Customization

IDOL Speech Server requires language packs to perform speech processing tasks. Several language packs are available.

The language model covers a broad vocabulary, reflecting the general spoken language. However, you might want to process speech data that covers specialized topics, such as financial or medical topics. The standard language model might not cover such specialized vocabulary or sentence structures. In such cases, you can build custom language models with specialized vocabulary for IDOL Speech Server to use when processing this audio.

Building a new language model requires a lot of text – in the order of millions or billions of words. The standard language packs are usually built with many billions of words of text. Therefore the best way to customize a language model is to build a small custom language model that uses the specialized text, and then combine it with the standard language model when you perform speech-to-text.

The supporting files for this tutorial include two example video files, moonshot101.mp4 and moonshotovewview.mp4. These videos are promotional videos on the HP Moonshot product. This tutorial takes you through the procesz of creating a small custom language model about HP Moonshot to use in conjunction with the standard U.S. English language model to improve the accuracy of the speech-to-text output for the example videos.

Prepare Training Text

To produce an effective custom language model, you must build it using text that resembles the data that you want to process. For example, if you intend to apply the speech-to-text task to news monitoring, you would train the language model using recent news articles gathered from a wide range of sources.

Preparing the training text is a two-stage process:

  1. Firstly you need to take the text from the articles and create a.txt file that contains the text from all the articles.
  2. Secondly you need to normalize the text so that word representations are standardized. For example, ‘1’ and ‘one’ are treated as two different representations of the same digit, so you must prepare the text so that numbers are written as words, and so on.
NOTE:

The training text file must be in UTF-8 format.

The moonshot_training.txt file in IDOLSpeechLanguageModellingTutorial.zip contains text from a few Web articles on HP Moonshot that you can use as training text. Before you can use the text, you must normalize it.

TIP:

You can use other IDOL products to retrieve and extract text from data repositories. For more information, refer to the documentation for IDOL Connectors and IDOL KeyView.

To normalize the training text

Build the Custom Language Model

You can now use the normalized training text to build a custom language model. You can set up a list of training files to use by using the IDOL Speech Server list manager.

To create a new list

To add the normalized training data to the list

To build a custom language model

This suggests that you should use the moonshot custom language model at an interpolation rate of 0.45 with the standard ENUS language model.

After you have created a custom language model, you can delete the list that you created for the training files by using the DelList action.

Perform Speech-to-Text with the Custom Language Model

You can now use your custom language model to perform speech-to-text processing.

To use a custom language model

You can specify the custom language model by adding the customLM parameter to your ACI action. In this case, the appended:0.45 tells IDOL Speech Server to use an interpolation rate of 0.45 with the ENUS language model.

You can check the status of the WavToText task by using a GetStatus command with the token returned in the ACI action above.

On completion of the task, you should find that your output file, moonshotoverview_moonshotLM.ctm, matches the file moonshotoverview_moonshotLM_out.ctm in the supporting files for this tutorial. You can also find the file moonshot101_moonshotLM_out.ctm, which corresponds to running the above command with the moonshot101.mp4 example video file.

Score the Output

After you create a custom language model and use it to perform speech-to-text processing, you should check that it has improved the accuracy.

If you have a verbatim transcription of the audio, you can use IDOL Speech Server to calculate the accuracy. To make sure that you are matching the same words, you must first normalize the verbatim transcript so that '1' is listed as 'one', and so on. The tutorial files include truth transcripts for both the example Moonshot files.

To normalize the transcript

where path is the location of the truth file and the output file.

To score the speech-to-text output against the normalized truth transcript

This action runs a Scorer task that writes the score results to the moonshotOverview_score.txt file. This file contains information on the extent to which the speech-to-text output matches the specified truth file. The summary at the end details the accuracy as a Recall, Precision, and F-measure score. Your output should be identical to moonshotOoverview_score_out.txt.

If you compare this result to the results obtained by using just the standard language pack, you should see a significant difference, with an increase in F-measure score from 54.75 to 63.7.

Conclusion

This tutorial shows how the creation of a custom language model can greatly improve speech-to-text accuracy. You can use text from public articles related to the content you want to process to build a custom language model. You can then run speech-to-text using this custom language model interpolated with the standard language pack to improve recognition results.


_HP_HTML5_bannerTitle.htm