The postproc
module filters and modifies results produced by audio processing tasks such as speech-to-text, language identification, and so on.
The postproc
module has two modes of operation, which can also be combined to perform both processes in a single operation.
Mode | Input | Output | Description |
---|---|---|---|
B
|
w
|
w
|
Accepts a word data stream, and replaces all barred words either with a fixed term (such as <BLEEP> ), or a term specific to an individual word. The barred word list is provided as a text file containing one word or a pair of words (the barred word, and the replacement term for that word) on each line. |
R
|
w
|
w
|
Accepts a data stream, and recombines any word fragments to form complete words. Writes the resulting word sequence to an output word stream. |
P
|
w1
|
w2
|
Include simple sentence-forming punctuation (for example, full stops and initial capital letters) in the output. HPE recommends that you do not use the output as the input for other modules; this option is designed purely for display purposes, and to produce more human-readable output. This mode uses periods of silence in a sequence of words to break it up into sentences with added punctuation, but you must remove any periods of silence manually from the punctuated string. NOTE:
You can use this mode in conjunction with word barring, but for use in conjunction with any other mode, you must call CAUTION:
This mode is designed for use with Latin languages, and therefore is not recommended for use with non-Latin languages. |
_
|
Carries out postprocessing of the output of a speech-to-text module, By default, no operation is performed for most languages. This mode receives words from the speech-to-text process and feeds them without delay to the next process (usually the word output process). The module carries out word recompounding automatically if the language requires it (currently the only languages that require recompounding are The following example shows a sample configuration file entry for this mode: [postproc] doPunctuation = $params.punctuation doWordBarring = $params.wordBar barredList = $params.wordBarList lang = $stt.lang forceRecompoundOn = $params.forceRecompoundOn The
Set the Set
You can perform the recombination, punctuation, and word barring modes simultaneously. You must request other postprocessing modes with a separate instance of NOTE:
Cascading of |
Examples:
w2 ← postproc (B,w1)
w2 ← postproc (R,w1)
w2 ← postproc (BR,w1)
W2 <- postproc(P, w1)
W2 <- postproc(_,...)
In B
mode, you must set at least one of the BarredList
and BarredTerm
parameters.
BarredList | Punctuation |
BarredTerm | RcmpValidList |
ForceRecompoundOff | RcmpAllowSuffix |
ForceRecompoundOn | WordBar |
NonSentFinalWords | WordBarList |
|