The concept of iterative training covers the approach of training an initial set of speaker templates, running the identification processes to identify further examples of each speaker, and then feeding back these new (validated) examples to the training stage to build better models. You can then repeat the whole process to generate more examples. At each stage the models are trained on more data, and should provide more reliable classification.
Assuming that you have a set of original templates, you can find new examples of each trained speaker by running speaker identification on new data. You can then generate audio feature files for these new audio examples, add these to the original training audio set, and retrain the template.
The following workflow diagram shows the iterative system, including feature file generation and audio segment labelling.
The steps shown in the workflow are as follows:
SpkIdFeature
task to create a set of audio feature files (ATV
) based on your initial training data set.SpkIdTrain
task to create an initial template (ATF
) based on the feature files generated in Step 1 for that speaker.SpkIdDevelWav
task to generate score statistics against all of the templates created in Step 2, generating one or more development score (ATD
) files.SpkIdDevelFinal
task to generate thresholds for all the speaker templates created in Step 2, using the development files created in Step 3.SpkIdEvalWav
task to identify instances of the trained speakers in unseen audio files.CTM
and so on) as necessary for each test data file. Add those files which have examples of the speakers to the training data set (and add the validated label file to the training labels set).You can retrain models in this manner whenever more training data becomes available. When you perform reiterative training, you do not have to add all of the extra data to the template training sets. You also have the option to add some of it to the held-out data set used for speaker template score analysis, to produce more reliable score threshold values.
Note: You should run the score analysis process again from scratch when you update the speaker models. In other words, all data that you used for optimization in the first iteration should be re-used (along with any extra audio data added to the held-out set as a result of analysis after iteration 1).
|