Standard Tasks

The following tasks are available out of the box.

Task Description
AfpAddTrackStream Adds a new audio track to an Audio Fingerprint database, receiving the audio data as a stream, and converting it into AFP features before indexing.
AfpAddTrackWav Adds a new audio track to an Audio Fingerprint database, reading the data from an audio file, and converting it into AFP features before indexing.
AfpDatabaseInfo Returns a list of all tracks that are currently stored within the specified Audio Fingerprint database.
AfpDatabaseOptimize Optimizes the internal indexing of the specified Audio Fingerprint database. This task permanently removes files that have been tagged for deletion using the AfpRemoveTrack task, and optimizes lookup functions for newly added tracks.
AfpMatchStream Receives audio data as a binary stream and searches it for any sections that match audio indexed in an AFP database.
AfpMatchWav Reads in data from an audio file, and searches it for any sections that match audio indexed in an AFP database.
AfpRemoveTrack Removes specified audio tracks from an AFP database.

AfptAddTrackStream

Performs the same task as AfpAddTrackStream, but uses a template database (fptdb), which improves robustness to audio mismatches at the cost of scaleability.
AfptAddTrackWav Performs the same task as AfpAddTrackWav, but uses a template database (fptdb), which improves robustness to audio mismatches at the cost of scaleability.
AfptDatabaseInfo Performs the same task as AfpDatabaseInfo, but uses a template database (fptdb), which improves robustness to audio mismatches at the cost of scaleability.
AfptMatchStream Performs the same task as AfpMatchStream, but uses template-based matching as opposed to landmarks, which improves robustness to audio mismatches at the cost of scaleability.
AfptMatchWav Performs the same task as AfpMatchWav, but uses template-based matching as opposed to landmarks, which improves robustness to audio mismatches at the cost of scaleability.
AfptRemoveTrack Performs the same task as AfpRemoveTrack, but uses a template database (fptdb), which improves robustness to audio mismatches at the cost of scaleability.
AmTrain Presents training audio and transcription data to the acoustic model training process, creating accumulator files that are used by the AmTrainFinal task to produce a final adapted acoustic model.
AmTrainFinal Produces the adapted acoustic model, given a set of accumulator files created by the AmTrain task.
AudioAnalysis Runs all the audio preprocessing tasks that are supported by the audiopreproc module in a single task.
AudioSecurity Detects and labels segments of audio containing alarms, screams, breaking glass, or gunshots.
ClippingDetection Detects clipping in audio data.
CombineFMD Combines several phoneme time track files into a single file, which can then be used for phonetic phrase match.
ClusterSpeech Clusters wide-band speech into speaker segments.
ClusterSpeechTel Clusters telephony speech into speaker segments.
ClusterSpeechToTextTel Clusters two speakers in a phone call, and uses the resulting speaker clusters to improve speech-to-text performance slightly by using speaker-sided acoustic normalization. Any telephony artifacts such as dial tones or DTMF tones are included, interspersed with the recognized words.
DataObfuscation Prepares training data with any sensitive or classified information concealed.
DialToneIdentification Detects and identifies DTMF dial tones in audio data.
IvSpkIdDevel Processes one or more speaker ID feature files to generate scores for tuning iVector-based score thresholds.
IvSpkIdDevelFinal Calculates the iVector score threshold based on one or more development files, and generates a new set of iVectors with the thresholds.
IvSpkIdDevelStream Takes a single audio stream, along with the name of the speaker the stream is associated with, and generates scores for tuning iVector thresholds.
IvSpkIdDevelWav Processes a single audio file to generate scores for tuning iVector thresholds.
IvSpkIdEvalStream

Runs iVector-based identification of any sections of an audio stream where the trained speakers are present.

IvSpkIdEvalWav Performs iVector-based speaker identification on a single audio file.
IvSpkIdFeature

Uses an audio file that contains sample speech from one person to create speaker ID feature files for use in iVector based template training, and template score threshold development.

IvSpkIdSetAdd

Adds a number of iVector speaker templates to a single speaker set file.

IvSpkIdSetDelete

Removes a speaker template from an iVector template set file.

IvSpkIdSetEditThresh Modifies the threshold of a single template stored in an iVector template set file.
IvSpkIdSetInfo Produces a log file that lists the contents of the specified iVector template set file.
IvSpkIdTmpEditThresh

Modifies the threshold of a single iVector template file.

IvSpkIdTmpInfo Produces a log file that lists the contents of the specified iVector template file.
IvSpkIdTrain Trains a new iVector speaker template.
IvSpkIdTrainStream

Takes a single audio stream containing speech data from the speaker to be trained, and creates a new iVector speaker template file.

IvSpkIdTrainWav

Takes a single audio file containing speech data from the speaker to be trained, and creates a new iVector speaker template file.

LangIdBndLif Reads in language identification features from file and determines boundaries in the feature sequence where the language changes. Returns the language identification results between boundaries.
LangIdBndStream Receives audio data as a binary stream, converts it into language identification features, and determines boundaries where the language changes. Returns the language identification results between boundaries.
LangIdBndWav Reads in data from an audio file, converts it into language identification features, and determines boundaries where the language changes. Returns the language identification results between boundaries.
LangIdCumLif Reads in language identification features from file. Returns the running language identification score at periodic intervals. This is the score for all the input data from the start to the current point.
LangIdCumStream Receives audio data as a binary stream and converts it into language identification features. Returns the running language identification score at periodic intervals. This is the score for all the input data from the start to the current point.
LangIdCumWav Reads in data from an audio file and converts it into language identification features. Returns the running language identification score at periodic intervals. This is the score for all the input data from the start to the current point.
LangIdFeature Converts audio files containing the relevant language into language identification feature (.lif) files, which are required for training language classifiers.
LangIdOptimize Optimizes the balance between language classifiers in a classifier set.
LangIdSegLif Reads in language identification features from file, processes the data in fixed-sized chunks, and returns the language identification results for each chunk.
LangIdSegStream Receives audio data as a binary stream and converts it into language identification features. HPE IDOL Speech Server processes the data in fixed-sized chunks, and returns the language identification results for each chunk.
LangIdSegWav Reads in data from an audio file and converts it into language identification features. HPE IDOL Speech Server processes the data in fixed-sized chunks, and returns the language identification results for each chunk.
LangIdTrain Reads in a set of language identification feature files created from audio representing a single language (using the LangIdFeature task), and uses this data to train a new language classifier.
LanguageModelBuild Builds a new language model from a set of text files.
LmListVocab Lists the most common words in the specified language model.
LmLookUp Verifies whether a specified word is present in the vocabulary of a particular language model and, if so, how frequently the word occurs.
LmPerplexity Analyzes the perplexity of a sample text file, when given a specific language model.
PunctuateCtm Adds punctuation to any .ctm file
Scorer Scores a speech recognition transcript (such as that generated by the SpeechToText task), when given a reference transcript file.
SearchFMD Searches for specified phrases in a phoneme time track file.
SegmentText Inserts whitespace between words in a text file (for languages that do not separate words with whitespace).
SegmentWav Attempts to segment audio into sections by speaker even if no trained speakers exist in the system.
SidPackage - Deprecated Packages a set of trained speaker models into a single speaker classification file.
SidTrain - Deprecated Uses an audio file to create or update a speaker training file.
SidTrainFinal - Deprecated Uses a base model and one or more speaker training files to generate a speaker template file.
SNRCalculation Analyzes the signal-to-noise levels across an audio file.
SpeechSilClassification Segments audio by contents: either speech, non-speech, or music.
SpkIdDevel Processes speaker ID feature files to generate scores for tuning model thresholds.
SpkIdDevelFinal

Estimates the thresholds for a set of speaker templates.

SpkIdDevelStream Creates or updates a development (.atd) file for an audio stream.
SpkIdDevelWav Creates or updates a development (.atd) file for an audio file.
SpkIdEvalStream Analyzes an audio stream to identify any sections where the trained speakers are present.
SpkIdEvalWav Analyzes an audio file to identify any sections where the trained speakers are present.
SpkIdFeature Creates a speaker ID feature file.
SpkIdSetAdd Takes one or more audio template files, and adds them to an audio template set file.
SpkIdSetDelete Removes a template from an audio template set file.
SpkIdSetEditThresh Modifies the threshold of a single template in an audio template set file.
SpkIdSetInfo Retrieves information on an audio template set file.
SpkIdTmpEditThresh Modifies the threshold of a single template.
SpkIdTmpInfo Retrieves information on an audio template file.
SpkIdTrain Uses one or more feature files to train a speaker template.
SpkIdTrainStream Takes an audio stream containing speech data from the speaker to be trained, and creates a new speaker template file.
SpkIdTrainWav Takes a single audio file containing speech data from the speaker to be trained, and creates a new speaker template file.
StreamSidOptimize Receives sample audio data for a trained (or untrained) speaker from a binary stream file, and updates statistics used in calculating speaker thresholds across the whole speaker classifier set.
StreamSidTrain Receives sample audio data for a specific speaker from a binary stream, and creates a speaker model to represent the speaker.
StreamSpeakerId - Deprecated Segments an audio stream by speaker and identifies known speakers, unknown speakers, and periods of non-speech within the audio. To run the StreamSpeakerID task, speakers must be trained to HPE IDOL Speech Server.
StreamToText Converts live audio into a text transcript.
StreamToTextMusicFilter Converts live audio into a text transcript and categorizes the audio so that you can remove any sections consisting of music or noise.
StreamToTextMusicFilterPunct Converts live audio into a text transcript that includes simple punctuation, and categorizes the audio so that you can remove any sections consisting of music or noise.
StreamToTextPunct Converts live audio into a text transcript that includes simple punctuation.
TelWavToText Transcribes a telephony audio file, including dial tones and DTMF dial tones.
TelWavToTextPunct Transcribes a telephony audio file, including dial tones and DTMF dial tones, and punctuation.
TextNorm Takes a raw text transcription file and produces a normalized form (by removing punctuation, rewriting numbers as words, altering word cases, and so on).
TranscriptAlign If a transcript is available for an audio recording, the transcript alignment function can place time locations for each word in the transcript. This function is suitable for aligning subtitles to audio or video files.
TranscriptCheck Checks how well a text transcript matches the audio data, identifying large missing or erroneous sections.
WavPhraseSearch Searches for a specified phrase or phrases in an audio file.
WavSidOptimize - Deprecated Reads in sample audio for a trained (or untrained) speaker from an audio file, and updates statistics used in calculating speaker thresholds across the whole speaker classifier set.
WavSidTrain - Deprecated Reads in sample audio for a specific speaker from an audio file and creates a speaker model to represent the speaker.
WavSpeakerId - Deprecated Segments an audio file by speaker and identifies known speakers, unknown speakers, and periods of non-speech within the audio file. To run the WavSpeakerID task, speakers must be trained to HPE IDOL Speech Server.
WavToFMD Creates a phoneme time track file from a single audio file.
WavToPlh Reads data from an audio file and produces an audio feature file, which is used in tasks such as Amtrain (adapts acoustic models).
WavToText Converts an audio file into a text transcript.
WavToTextPunct Converts an audio file into a text transcript with simple sentence-forming punctuation (for example, full stops and initial capital letters).

_HP_HTML5_bannerTitle.htm