The following XML shows a single record produced by speech-to-text.
<output> <record> <timestamp> ... </timestamp> <trackname>SpeechToText.Result</trackname> <SpeechToTextResult> <id>5c6a6fe9-04aa-4ec2-9f06-9c28827a1cb6</id> <text>all</text> <confidence>80</confidence> <alternative> <id>b05a75af-8515-4ed5-845e-caf86e2b25b9</id> <text>fall</text> <score>97</score> <startOffset>-60</startOffset> <endOffset>170</endOffset> </alternative> <alternative> <id>98cfe8e2-a377-4719-a12c-441266cfe657</id> <text>call</text> <score>91</score> <startOffset>-60</startOffset> <endOffset>170</endOffset> </alternative> ... <matched>false</matched> </SpeechToTextResult> </record> </output>
The record contains the following information:
id
element provides a unique identifier for the result.text
element provides the recognized word (the "primary" word).confidence
element provides the confidence score for the recognized word.One or more alternative
elements might be present, but only if you set the parameter AlternativeWordsThreshold
. The following elements are present for each alternative word:
text
element provides the alternative word.score
element provides the score for the alternative word. The scores for alternative words are relative to the primary word.startOffset
and endOffset
elements provide offsets for the start and end times. For example, the alternative choice "fall" begins 60 milliseconds before the record start time and ends 170 milliseconds after the record start time.An alternative word is included in the result if it overlaps chronologically with the primary word, and has a score that exceeds the threshold specified by the AlternativeWordsThreshold
parameter. This means that you might see the same alternative word repeated in several records.
matched
element indicates whether the primary word is in the list of words specified by the MatchWords
configuration parameter (or an overlapping alternative word is in the list and has a score greater than the value of the MatchWordsThreshold
parameter). You might use this information to perform audio redaction on specific words.
|