I just received a report from Matt Atcheson, one of our HDR testers at UWA, with the results of some work he’s done on evaluating the HTK integration with the HCS vLab. Matt used my template Python interface to download audio files from the vLab and feed them to the HTK training algorithms to train a digit string recogniser. He was then able to test the recogniser on unknown data also downloaded from the vLab.
The results were interesting:
Using the full set of digit recordings that I could find (about 940 of them), setting aside 10% for testing, and with a grammar that constrains transcripts to exactly four digits, I get about 99% word accuracy, and about 95% sentence accuracy.
====================== HTK Results Analysis =======================Date: Tue Jan 28 21:08:50 2014Ref : >ntu/hcsvlab_api_testing_matt/
digitrec/data/testing_files/ testref1.mlfRec : >buntu/hcsvlab_api_testing_ matt/digitrec/data/testing_ files/recout.mlf———————— Overall Results ————————–SENT: %Correct=94.74 [H=90, S=5, N=95]WORD: %Corr=98.95, Acc=98.42 [H=564, D=3, S=3, I=3, N=570]