AlignAudioTranscript
Transcript alignment takes a transcript of the speech in a media file and, by processing the speech, assigns timestamps to all the words in the transcript. This is useful because it allows an application to provide search results from the transcript and open the media file at the correct position. You can also synchronize manually created subtitle text with the speech in a video file.
Type: asynchronous
Parameter | Description | Required |
---|---|---|
AudioData
|
The media file to align the transcript to. Files must be uploaded as multipart/form-data. For more information about sending data to Media Server, refer to the Media Server Administration Guide. | Set this or audiopath |
AudioPath
|
The path of the media file to align the transcript to. The path must be absolute, or relative to the Media Server executable file. | Set this or audiodata |
LanguagePack
|
The speech-to-text language pack to use to process the audio. To obtain a list of language packs that have been installed with your Media Server, use the action ListSpeechLanguagePacks. | Yes |
Normalize
|
A Boolean value (default true) that specifies whether to normalize the text. This is not supported for all languages. If normalization is not supported, normalize the text manually and set this parameter to false . |
No |
SampleFrequency
|
The sample frequency at which to process the audio (default 16000). | No |
TextData
|
The text file that contains the transcript to align. Text files must be uploaded as multipart/form-data. For more information about sending data to Media Server, refer to the Media Server Administration Guide. | Set this or textpath |
TextPath
|
The path of the text file that contains the transcript to align. The path must be absolute, or relative to the Media Server executable file. | Set this or textdata |
Example
curl http://localhost:14000 -F action=AlignAudioTranscript -F audiodata=@audio.wav -F textdata=@transcript.txt -F languagepack=ENUS -F samplefrequency=16000
Response
The AlignAudioTranscript
action is asynchronous, so Media Server returns a token. You can use the token with the QueueInfo action to obtain the results. A sample response from the QueueInfo
action appears below.
The response includes the start time, end time, and duration for each word in seconds.
<autnresponse> <action>QUEUEINFO</action> <response>SUCCESS</response> <responsedata> <actions> <action> <status>Finished</status> <queued_time>2018-Aug-22 08:27:49</queued_time> <time_in_queue>0</time_in_queue> <process_start_time>2018-Aug-22 08:27:49</process_start_time> <time_processing>51</time_processing> <process_end_time>2018-Aug-22 08:28:40</process_end_time> <output> <TranscriptAlignResult> <duration>0.59</duration> <endTime>0.59</endTime> <startTime>0.00</startTime> <text>The</text> </TranscriptAlignResult> <TranscriptAlignResult> <duration>0.91</duration> <endTime>1.50</endTime> <startTime>0.59</startTime> <text>news</text> </TranscriptAlignResult> ... </output> </action> </actions> </responsedata> </autnresponse>