Astute users may have noticed a change to the AusTalk collection in Alveo in the last couple of days. We are re-ingesting AusTalk into Alveo to correct some errors with the previous version of the data. This means that we removed the old version and then re-loaded all of the data into Alveo. As I write this there are 400,000 of the 800,000 items available; the remainder should load over the next day.
This new ingest will allow us to attach the annotation files to those items in AusTalk that have either been transcribed or annotated phonetically. Once these are in place we’ll provide some pointers to finding and working with annotated data.
One of the errors we found with the data was that we had included some speakers that did not belong in the core AusTalk collection. In some cases these were test speakers who should not have been published, but most of them were from a later accented English collection by Michael Wagner which used the AusTalk protocol to collect data from a different group of target speakers.
We will make the accented AusTalk data available as soon as we have that all in one place. We also have the AusTalk Emotional speech collection from Julien Epps at UNSW in preparation. Finally the video data associated with the main AusTalk collection will be made available as a separate collection on Alveo.