[mixer] Module Configuration

The mixer module combines word outputs from multiple modules into a single timeline that has a maximum of one word for each time position. For example, you can use this module to combine the results for each channel in a stereo speech-to-text task, if you do not require separate results for each channel.

You can combine the output from up to five channels (wa, wb, wc, wd and we). If multiple inputs contain a word other than silence in a particular time position, the word in wa takes precedence over the word in wb, wb takes precedence over wc, and so on. Each input word appears a maximum of once in the output.

The labels <s> and <SIL> represent silence, and therefore have no precedence in any channel.

For example, if the input for wb is:

0.000 1.000 Hello 0.000

and input wa is:

0.000 0.500 <s> 0.000 
0.500 0.100 This 0.000 
0.600 0.400 <SIL> 0.000

The combined output is:

0.000 0.500 Hello 0.000 
0.500 0.100 This 0.000 
0.600 0.400 <SIL> 0.000

Input and Output

Mode Input Output Description
  wa, wb, wc, wd   Combines the word label streams wa, wb, wc and wd into a single word label stream.

Examples:

w5 <- mixer (_, wa:w1, wb:w2)
w5 <- mixer (_, wa:w1, wb:w2, wc:w3, wd:w4)

Parameters

MixTypeA

MixTypeB

MixTypeC

MixTypeD


_HP_HTML5_bannerTitle.htm