HPE IDOL Speech Server can search an audio file or stream for clips that are present in an AFP database.
AfpMatchWav
task to search an audio file. AfpMatchStream
task to search an audio stream. For more details about these standard tasks, see the HPE IDOL Speech Server Reference.
To search audio for known clips
Send an AddTask
action to HPE IDOL Speech Server, and set the following parameters:
Type
|
The task type. Set to AfpMatchWav . |
File
|
The audio file to search. To restrict processing to a section of the audio file, set the start and end times in the |
Out
|
The file to write the search results to. |
To use a database that is defined in the HPE IDOL Speech Server tasks configuration file, you must set the AfpDb
parameter. If the database is not defined, set both the Pack
and PackDir
parameters instead.
AfpDb
|
The name of a database that is defined in the tasks configuration file. |
Pack
|
The name of a database that is not defined in the tasks configuration file. |
PackDir
|
The path to the directory that contains the database files. |
For example:
http://localhost:13000/action=AddTask&Type=AfpMatchWav&File=C:\Data\Sample.wav&Pack=Adverts&PackDir=C:\resources&Out=SearchResults.ctm
This action uses port 13000
to instruct HPE IDOL Speech Server, which is located on the local machine, to search the Sample.wav
file for sections that match audio clips in the Adverts
database, and to write the results to the SearchResults.ctm
file.
This action returns a token. You can use the token to:
Because the AFP task produces results while it processes the audio data, you can retrieve results before the task is complete.
The results are stored in CTM format, but you can send the GetResults
action to view them in either CTM or XML format. An example of an AFP result in CTM format is:
1
|
1
|
2994.75
|
0.84
|
ADVERT1
|
5.59
|
2991.32
|
2995.08
|
2990.96
|
14
|
21
|
1
|
1
|
2995.59
|
0.55
|
ADVERT1
|
4.91
|
2991.32
|
2996.00
|
2990.96
|
15
|
23
|
1
|
1
|
2996.14
|
0.73
|
ADVERT1
|
5.30
|
2991.32
|
2996.60
|
2990.96
|
18
|
28
|
1
|
1
|
2996.87
|
0.52
|
ADVERT1
|
5.54
|
2991.32
|
2996.92
|
2990.96
|
20
|
31
|
1
|
1
|
2997.39
|
1.11
|
ADVERT1
|
4.82
|
2991.32
|
2997.96
|
2990.96
|
21
|
|
1
|
1
|
2998.50
|
1.22
|
ADVERT1
|
4.68
|
2991.32
|
2998.80
|
2990.96
|
23
|
35
|
1
|
1
|
2999.72
|
1.14
|
ADVERT1
|
4.45
|
2991.32
|
3000.08
|
2990.96
|
25
|
39
|
1
|
1
|
3000.86
|
2.35
|
ADVERT1
|
3.88
|
2991.32
|
3002.92
|
2990.96
|
29
|
45
|
From left to right, the columns in the output data file contain:
1
).1
is the top ranked result for a section).The start time of the audio section that produced this result update.
The start time of the entire matched section is given in a different column.
You should view each result line as an update to an ongoing match, rather than a complete result. While HPE IDOL Speech Server processes the audio, it might return multiple results for the same track, starting at the same point in the audio. In this case, the number of hits increases between successive results, and the current end point of the match increases.
The final result in such a sequence is the complete section match result for the specified hypothesis. For example, in the previous example, the match of the reference track ADVERT1
completes with the result:
1
|
1
|
3000.86
|
2.35
|
ADVERT1
|
3.88
|
2991.32
|
3002.92
|
2990.96
|
29
|
45
|
This result shows that the processed audio matched the audio for the track ADVERT1
stored in the database between 2991.32s and 3002.92s. The estimated start point of the ADVERT1
data is actually 2990.96s (which suggests that the first 0.36s of the file was not matched). The match scored 29 hits, which is 45% of the total audio fingerprint features in the database clip. The last section of audio analyzed that contributed to this match was from 3000.86s to 3003.21s.
The following example shows the final output from the previous ADVERT1
match in XML format:
<afp_record> <rank>1</rank> <label>ADVERT1</label> <start>2991.32</start> <end>3002.92</end> <output>3003.210</output> <eststart>2990.96</eststart> <hits>29</hits> <score>45</score> <hitrate>2.636</hitrate> <scorerate>0.000</scorerate> <scoreavg>0.000</scoreavg> </afp_record>
The <output>
tags record the time that the final result was produced (the endpoint for processing the target audio, not just the matching section)
The <scorerate>
and <scoreavg>
tags are reserved for future functionality and always contain the value 0.000.
|