Use Your Content > Improve > Speaker Identification > Diagnostics

Diagnostics

You can create a diagnostic file for the template creation, score calculation, threshold calculation, and speaker identification tasks.

Use the DiagLevel parameter to control the content that is written to the diagnostics file. For example, if you set DiagLevel to 0, Speech Server does not record diagnostics information, whereas if you set DiagLevel to 3, Speech Server records the maximum amount of information..

Use the DiagFile parameter to specify file name of the diagnostics file.

This section shows example diagnostics information from the different tasks.

Training

The following example shows some of the key information in a diagnostics file produced during template training.

<TEMPLATE_TRAIN_NEWMODEL> C:\IDOLSpeech\speakerId\Brown.atf
<TEMPLATE_TRAIN_ITERATIONS> 5
<TEMPLATE_TRAIN_SFREQ> 16000
<TEMPLATE_TRAIN_TOTALFRAMES> 35399
<TEMPLATE_TRAIN_USEDFRAMES> 30003
<TEMPLATE_TRAIN_PERCENTUSED> 84.7566
<GMMTRAIN_RELEVANCE> 200
<GMMTRAIN_EM1> 4.372
<GMMTRAIN_EM2> 3.668
<GMMTRAIN_EM3> 2.481
<GMMTRAIN_EM4> 1.340
<GMMTRAIN_EM5> 1.023

The diagnostics file contains some general information, such as the name of the output file to generate, the number of training iterations that will be performed, and the sample frequency of the data being used to train the model. It also has some information about the duration of the audio data supplied to train the template (the total number of frames, where one frame is 1/100th of a second) as well as the actual number of frames used (discarding silence, and so on).

The <GMM_TRAIN_EM*> values are useful for determining how many training iterations you should set. This number shows how much the model was updated at each iteration, and eventually tails off, showing that the model is relatively stable. It is at this point that running further iterations will probably bring very little benefit. In the example above, Speech Server ran 5 iterations of training, but the default is 20. In general you should not need to change the default value, but if the model is still showing signs of significant updates after 20 iterations, you might want to increase it by adding the nEmIter parameter to the relevant [audioTemplateTrain] module. For example:

[audiotemplateTrain15]
SpkIdBasePack = SIDBASE
SampleFrequency = $stream15.sampleFrequency
nEmIter = $params.iterations = 50

Development: Score Statistics

The following example shows some of the key information in a diagnostics file produced during the score statistics stage of development.

<MODE> ADD_STREAM
<SFREQ> 16000
<DATA_LABEL> Brown
<SEGMENT_SCORES> 0.00s-30.00s (30.00s)
<SCORE> (3000 frames, 91.0667 speech) model:Brown file:Brown true 1.96331
<SCORE> (3000 frames, 91.0667 speech) model:Cameron file:Brown false -0.569314
<SCORE> (3000 frames, 91.0667 speech) model:Clegg file:Brown false -0.121911
<SEGMENT_SCORES> 30.00s-60.00s (30.00s)
<SCORE> (3000 frames, 90.6667 speech) model:Brown file:Brown true 1.99159
<SCORE> (3000 frames, 90.6667 speech) model:Cameron file:Brown false -0.648818
<SCORE> (3000 frames, 90.6667 speech) model:Clegg file:Brown false -0.0711688

As well as general information about the input audio sample frequency and the input mode, the diagnostic file gives the scores for each segment of the input data, against each of the speaker templates. The label of the input data is also given, in this case Brown. The scores are then shown for each segment (in this case, 30.00s).

For each <SCORE> line, you can see the number of frames evaluated, the model that was used for evaluation, the label of the data at this segment, whether the segment is considered a true or false hit for the specified speaker template, and finally the score. For example:

<SCORE> (3000 frames, 91.0667 speech) model:Brown file:Brown true 1.96331

This shows that for the template Brown, the current segment is treated as a true positive example (because the data label is also Brown). This particular segment gets a score of 1.96331. You should find that the positive examples have higher scores than the negatives – looking through the diagnostics file produced at this stage can identify sections where this may not be the case, and which might need further investigation.

Development: Threshold Calculation

The following example shows some of the key information in a diagnostics file produced during the score statistics stage of development.

<SFREQ> 16000
<BALANCE_BIAS> 0.2
<SPKDEVEL> Brown
    <THRESHOLD> 0.268926
    <MINPOS>    1.8513
    <LOWPOS>    1.99159
    <MEDPOS>    2.342
    <HIGPOS>    2.37679
    <MAXPOS>    2.588
    <AVGPOS>    2.28259
    <MINNEG>    -1.80892
    <LOWNEG>    -1.28672
    <MEDNEG>    -0.916315
    <HIGNEG>    -0.633139
    <MAXNEG>    -0.126669
    <AVGNEG>    -0.948387
    <POSEGS>    11
    <NEGEGS>    91
    <TRUEHITS>  11
    <FALSEHITS> 0
    <MISSES>    0
    <COST>      0
    <PRECISION> 100
    <RECALL>    100
    <FMEASURE>  100
</SPKDEVEL>

The information generated during threshold calculation is very useful in terms of evaluating the system performance, and identifying any major issues. For each speaker template, it shows the range of positive scores observed (that is, the scores for that speaker template against audio data from the same speaker) as well as the range of negative scores (that is, the scores for the speaker template against audio data from a different speaker). Ideally these ranges would not overlap, and would have a significant separation, although this is not always the case.

The number of positive examples (<POSEGS>) and negative examples (<NEGEGS> ) used in generating the threshold is shown, along with the threshold computed (<THRESHOLD>). The number of true hits (<TRUEHITS>), false positives (<FALSEHITS>) and misses that would have resulted from the selected threshold are also shown, as are the statistics on recall, precision, and the combined f-measure. An f-measure of less than 100 shows that there is overlap in positive and negative hit scores, and that there is no threshold that results in a perfect score.

Evaluation

The following example shows some of the key information in a diagnostics file produced during the evaluation stage.

<TEMPLATE_LST> /home/devel2/speakerIdV2/serverTesting/dbl/politicians-models-V2.dbl
<SFREQ> 16000
<NMODELS> 3
<SEGMENT> 0.050 5.000 MALE
<RESULT_N0> Cameron score:0.0698535 (win%:48.6239) var:68.6247 pass:0 valid:1
<RESULT_N1> Clegg score:-0.412641 (win%:29.8165) var:68.6247 pass:0 valid:1
<RESULT_N2> Brown score:-1.09827 (win%:21.5596) var:68.6247 pass:0 valid:1
</SEGMENT>
<SEGMENT> 5.000 9.950 MALE
<RESULT_N0> Clegg score:0.0850429 (win%:40.0901) var:81.7719 pass:0 valid:1
<RESULT_N1> Brown score:-0.00135769 (win%:35.5856) var:81.7719 pass:0 valid:1
<RESULT_N2> Cameron score:-0.541556 (win%:24.3243) var:81.7719 pass:0 valid:1
</SEGMENT>

The diagnostic file gives the segment by segment scores for each speaker template. The following information is given for each template:

In summary, there is more detailed information in the diagnostic file than in the results file. You can use the diagnostics file to analyze segments for which an unexpected result may have been given.


_HP_HTML5_bannerTitle.htm