AlignAudioTranscript
Transcript alignment takes a transcript of the speech in a media file and, by processing the speech, assigns timestamps to all the words in the transcript. This is useful because it allows an application to provide search results from the transcript and open the media file at the correct position. You can also synchronize manually created subtitle text with the speech in a video file.
Type: asynchronous
Parameter | Description | Required |
---|---|---|
AudioData
|
The media file to align the transcript to. Send files to Media Server using a multipart/form-data HTTP POST request. | Set this or audiopath |
AudioPath
|
The path of the media file to align the transcript to. The path must be absolute, or relative to the Media Server executable file. | Set this or audiodata |
IngestDateTime
|
You can use this parameter to configure the start time for the timestamps in the output. For example, set this parameter when you want the timestamps to match the time when the video was broadcast. Specify the date and time in one of the following ways:
|
No |
LanguagePack
|
The speech-to-text language pack to use to process the audio. To obtain a list of language packs that have been installed with your Media Server, use the action ListSpeechLanguagePacks. | Yes |
Normalize
|
A Boolean value (default true) that specifies whether to normalize the text. This is not supported for all languages. If normalization is not supported, normalize the text manually and set this parameter to false . |
No |
SampleFrequency
|
(Deprecated) The sample frequency at which to process the audio. This parameter is deprecated. Media Server can determine the correct sample frequency from the language pack. The parameter might be removed in future. | No |
TextData
|
The text file that contains the transcript to align. Send files to Media Server using a multipart/form-data HTTP POST request. | Set this or textpath |
TextPath
|
The path of the text file that contains the transcript to align. The path must be absolute, or relative to the Media Server executable file. | Set this or textdata |
Example
curl http://localhost:14000/action=AlignAudioTranscript -F audiodata=@audio.wav -F textdata=@transcript.txt -F languagepack=ENUS -F samplefrequency=16000
Response
The AlignAudioTranscript
action is asynchronous, so Media Server returns a token. You can use the token with the QueueInfo action to obtain the results. A sample response from the QueueInfo
action appears below.
The response includes the start time, end time, and duration for each word. OpenText recommends using the timestamp
field, which includes times in both ISO 8601 format and epoch microseconds. The other startTime
, duration
, and endTime
fields provide timestamps only in milliseconds from the beginning of the file, and are deprecated. In the example below, the first timestamp begins at 1970-01-01 (the beginning of the UNIX epoch). If you want the timestamps to match the broadcast time, set the action parameter IngestDateTime
in the AlignAudioTranscript
request.
The text
field identifies the word corresponding to the timestamp.
<autnresponse> <action>QUEUEINFO</action> <response>SUCCESS</response> <responsedata> <actions> <action> <status>Finished</status> <queued_time>2023-Feb-13 11:16:24</queued_time> <time_in_queue>3</time_in_queue> <process_start_time>2023-Feb-13 11:16:27</process_start_time> <time_processing>21</time_processing> <process_end_time>2023-Feb-13 11:16:48</process_end_time> <output> <TranscriptAlignResult> <timestamp> <startTime iso8601="1970-01-01T00:00:00.000000Z">0</startTime> <duration iso8601="PT00H00M00.100000S">100000</duration> <peakTime iso8601="1970-01-01T00:00:00.000000Z">0</peakTime> <endTime iso8601="1970-01-01T00:00:00.100000Z">100000</endTime> </timestamp> <text>The</text> <startTime>0</startTime> <duration>100</duration> <endTime>100</endTime> </TranscriptAlignResult> <TranscriptAlignResult> <timestamp> <startTime iso8601="1970-01-01T00:00:00.100000Z">100000</startTime> <duration iso8601="PT00H00M00.460000S">460000</duration> <peakTime iso8601="1970-01-01T00:00:00.100000Z">100000</peakTime> <endTime iso8601="1970-01-01T00:00:00.560000Z">560000</endTime> </timestamp> <text>news</text> <startTime>100</startTime> <duration>460</duration> <endTime>560</endTime> </TranscriptAlignResult> ... </output> </action> </actions> </responsedata> </autnresponse>