The mixer
module combines word outputs from multiple modules into a single timeline that has a maximum of one word for each time position. For example, you can use this module to combine the results for each channel in a stereo speech-to-text task, if you do not require separate results for each channel.
You can combine the output from up to five channels (wa
, wb
, wc
, wd
and we
). If multiple inputs contain a word other than silence in a particular time position, the word in wa
takes precedence over the word in wb
, wb
takes precedence over wc
, and so on. Each input word appears a maximum of once in the output.
The labels <s>
and <SIL>
represent silence, and therefore have no precedence in any channel.
For example, if the input for wb
is:
0.000 1.000 Hello 0.000
and input wa
is:
0.000 0.500 <s> 0.000 0.500 0.100 This 0.000 0.600 0.400 <SIL> 0.000
The combined output is:
0.000 0.500 Hello 0.000 0.500 0.100 This 0.000 0.600 0.400 <SIL> 0.000
Mode | Input | Output | Description |
---|---|---|---|
wa , wb , wc , wd |
Combines the word label streams wa , wb , wc and wd into a single word label stream. |
Examples:
w5 <- mixer (_, wa:w1, wb:w2)
w5 <- mixer (_, wa:w1, wb:w2, wc:w3, wd:w4)
MixTypeA |
MixTypeB |
MixTypeC |
MixTypeD |
|