Accessing AusTalk in Alveo

AusTalk is a large collection of spoken Australian English collected in the last few years at sites around Australia. When the collection is complete it will have close to 1000 speakers each with a range of recordings from isolated words to interview and map task recordings.  Alveo contains most of the data and will have the complete corpus when collection and data processing is complete.
Continue reading

Alveo Hackfest

Preceding the official launch of Alveo on July 1 we will be holding a Hackfest for a hands-on day with Alveo.   We hope the outcome of the day will be some exciting ideas and maybe even the start of some interesting research outcomes using data from the Alveo repository.

Continue reading

LREC2014: The Alveo Virtual Laboratory: A Web Based Repository API

The Alveo Virtual Laboratory is an eResearch project funded
under the Australian Government NeCTAR program to build a platform for collaborative eResearch around
data representing human communication and the tools that researchers use in their analysis. The human
communication science field is broadly defined to encompass the study of language from various
perspectives but also includes research on music and various other forms of human expression.
This paper outlines the core architecture of the Alveo and in
particular, highlights the web based API that provides access to data and
tools to authenticated users.

Creative Commons License
This work by Steve Cassidy, Dominique Estival, Tim Jones, Peter Sefton, Denis Burnham and Jared Berghold is licensed under a Creative Commons Attribution 4.0 International License.

Training a Speech Recogniser with HCS vLab

I just received a report from Matt Atcheson, one of our HDR testers at UWA, with the results of some work he’s done on evaluating the HTK integration with the HCS vLab.  Matt used my template Python interface to download audio files from the vLab and feed them to the HTK training algorithms to train a digit string recogniser.   He was then able to test the recogniser on unknown data also downloaded from the vLab.

The results were interesting:

Using the full set of digit recordings that I could find (about 940 of them), setting aside 10% for testing, and with a grammar that constrains transcripts to exactly four digits, I get  about 99% word accuracy, and about 95% sentence accuracy.

====================== HTK Results Analysis =======================
  Date: Tue Jan 28 21:08:50 2014
  Ref : >ntu/hcsvlab_api_testing_matt/digitrec/data/testing_files/testref1.mlf
  Rec : >buntu/hcsvlab_api_testing_matt/digitrec/data/testing_files/recout.mlf
———————— Overall Results ————————–
SENT: %Correct=94.74 [H=90, S=5, N=95]
WORD: %Corr=98.95, Acc=98.42 [H=564, D=3, S=3, I=3, N=570]
Matt also gave us some good feedback based on his experiments.  If there are other testers interested in trying to repeat this experiment or explore a bit on their own, Matt’s code is available on BitBucket.
To run his experiments, Matt made use of a virtual machine on the Nectar Research Cloud.  Any Australian researcher can login to the cloud and get a free allocation of virtual machines.  We’ve made a VM image (called ‘HCSvLab Tools’, listed in the Public list of snapshots on your dashboard) that has HTK, DeMoLib and INDRI pre-installed; as a user, you can create your own instance of this image and start working with these tools.