Phonetic Phrase Search

The following diagram shows the modules in HPE IDOL Speech Server that enable phonetic search in a single action.

The wav module reads the audio file and prepares windowed data.

a is the resulting audio window series.

The frontend module takes each window of samples and converts it to a feature vector.

f is the feature vector series.

The normalizer module adjusts the feature vectors to produce normalized feature vectors.

nf is the normalized feature vector series.

The phraseprematch module searches the feature vector time series and creates phrase identification positions, including confidence scores.

w is the output time-marked word series.

The wout module prepares the output phrase labels and time positions for storage and result reporting.

The schema that implements this feature is:

[MyPhraseSearch]
a ← wav (MONO, input)
f ← frontend (_, a)
nf ← normalizer (_, f)
w ← phraseprematch (_, nf)
output ← wout (_, w) 

_HP_HTML5_bannerTitle.htm