SpeechToText
Runs speech-to-text on the file(s) associated with an IDOL document FlowFile, and adds the text to the IDOL document.
For information about the audio and video file formats that are supported, refer to the Media Server Administration Guide.
Properties
Name | Default Value | Description |
---|---|---|
IDOL License Service | An IdolLicenseServiceImpl that provides a way to communicate with an IDOL License Server. | |
Media Service | A MediaServiceImpl that manages media analysis resources. | |
Language Pack | English (UK) |
The language pack to use for speech-to-text. NOTE: Language packs can contain hundreds of megabytes of data, so they are not included in the installation and must be downloaded separately. Extract language packs to the "Speech Language Pack Directory" specified in your Media Service. |
Telephony | False | Specifies whether the audio is telephony with an 8kHz sample rate. |
Shared Custom Language Model | The identifier of an optional custom language model to use. Set this property to use a custom language model stored in the external database specified by the Media Service (see the "Media Service" property). | |
Shared Custom Word Database | The name of an optional custom word database to use. Set this property to use a custom word database stored in the external database specified by the Media Service (see the "Media Service" property). | |
Custom Language Model File | An optional custom language model to use. Specify the path of a file generated by the Media Server action ExportCustomSpeechLanguageModel . |
|
Custom Word Database File | An optional custom word database to use. Specify the path of a file generated by the Media Server action ExportCustomSpeechWordDatabase . |
|
Custom Language Model Weighting | The interpolation weight to use for the custom language model. You only need to set this property if you want to override the recommended weight, as returned by the Media Server action ListCustomSpeechLanguageModels . |
Relationships
Name | Description |
---|---|
success | Processing was successful. |
failure | Processing failed. |
Example Output
The processor adds the transcribed speech to the document content. It also adds metadata to the document, as shown in the following example.
<idol_media> <speechtotext> <word duration="0.039" start="0"> <text><SIL></text> </word> <word duration="0.22" start="0.039"> <text>now</text> </word> <word duration="0.14" start="0.259"> <text>the</text> </word> <word duration="0.38" start="0.399"> <text>latest</text> </word> <word duration="0.32" start="0.779"> <text>news</text> </word> ... </speechtotext> </idol_media>
The metadata contains a word
element for each word. The start
and duration
attributes provide timestamps, in seconds.
The text
element provides the word that was recognized. This element can also have a value of <SIL>
or <s>
, which indicates a period of audio without speech, such as silence or background noise. <SIL>
indicates silence that probably has no linguistic role. <s>
is more likely to end a chain of words, for example when a speaker begins a new sentence.