Speech-to-Text Results

The following XML shows a single record produced by speech-to-text.

<output>
  <record>
    <timestamp>
      ...
    </timestamp>
    <trackname>SpeechToText.Result</trackname>
    <SpeechToTextResult>
      <id>5c6a6fe9-04aa-4ec2-9f06-9c28827a1cb6</id>
      <text>all</text>
      <confidence>80</confidence>
      <alternative>
        <id>b05a75af-8515-4ed5-845e-caf86e2b25b9</id>
        <text>fall</text>
        <score>97</score>
        <startOffset iso8601="-PT00H00M00.060000S">-60</startOffset>
        <endOffset iso8601="PT00H00M00.170000S">170</endOffset>
      </alternative>
      <alternative>
        <id>98cfe8e2-a377-4719-a12c-441266cfe657</id>
        <text>call</text>
        <score>91</score>
        <startOffset iso8601="-PT00H00M00.060000S">-60</startOffset>
        <endOffset iso8601="PT00H00M00.170000S">170</endOffset>
      </alternative>
      ...
      <language>ENUK</language>
      <matched>false</matched>
    </SpeechToTextResult>
  </record>
</output>

The record contains the following information:

  • The id element provides a unique identifier for the result.
  • The text element provides the recognized word (the "primary" word).

    This element can also have a value of <SIL> or <s>, which indicates a period of audio without speech, such as silence or background noise. <SIL> indicates silence that probably has no linguistic role. <s> is more likely to end a chain of words, for example when a speaker begins a new sentence.

  • The confidence element provides the confidence score for the recognized word.
  • One or more alternative elements might be present, but only if you set the parameter AlternativeWordsThreshold. The following elements are present for each alternative word:

    • The text element provides the alternative word.
    • The score element provides the score for the alternative word. The scores for alternative words are relative to the primary word.
    • The startOffset and endOffset elements provide offsets for the start and end times. For example, the alternative choice "fall" begins 60 milliseconds before the record start time and ends 170 milliseconds after the record start time.

    An alternative word is included in the result if it overlaps chronologically with the primary word, and has a score that exceeds the threshold specified by the AlternativeWordsThreshold parameter. This means that you might see the same alternative word repeated in several records.

  • The language element indicates which language was used when running speech-to-text. Normally this value matches the value of the LanguagePack parameter. If you set LanguagePack=input, this will be the language identified by Language ID.
  • The matched element indicates whether the primary word is in the list of words specified by the MatchWords configuration parameter (or an overlapping alternative word is in the list and has a score greater than the value of the MatchWordsThreshold parameter). You might use this information to perform audio redaction on specific words.