Astute users may have noticed a change to the AusTalk collection in Alveo in the last couple of days. We are re-ingesting AusTalk into Alveo to correct some errors with the previous version of the data. This means that we removed the old version and then re-loaded all of the data into Alveo. As I write this there are 400,000 of the 800,000 items available; the remainder should load over the next day.
This new ingest will allow us to attach the annotation files to those items in AusTalk that have either been transcribed or annotated phonetically. Once these are in place we’ll provide some pointers to finding and working with annotated data.
One of the errors we found with the data was that we had included some speakers that did not belong in the core AusTalk collection. In some cases these were test speakers who should not have been published, but most of them were from a later accented English collection by Michael Wagner which used the AusTalk protocol to collect data from a different group of target speakers.
We will make the accented AusTalk data available as soon as we have that all in one place. We also have the AusTalk Emotional speech collection from Julien Epps at UNSW in preparation. Finally the video data associated with the main AusTalk collection will be made available as a separate collection on Alveo.
I’m pleased to report that the Alveo server is now fully restored and all services should be working again as normal. AAF login is working again and password reset emails are now being delivered.
There is some work still in progress. In particular the Galaxy server will be updated soon with some more tools for manipulating speech data. We have been building tools to support workflows involving forced-alignment with MAUS and formant tracking with the Emu wrassp toolkit. These are now mostly working and we will deploy them as soon as we can. The use of Galaxy for speech and language analysis is a new development and we are still working out the best way to build tools and chain them together. When we have some tools available we’ll invite you to experiment and provide feedback so that we can hopefully build something that is generally useful to the community.
As of this morning (1st November) the Alveo server is offline. We are currently moving the server from its previous home at Intersect in Sydney to the facilities of NCI in Canberra. We had hoped to have a seamless transition between the two services but unfortunately the new server is not quite ready.
We will bring Alveo back online as soon as possible. All user accounts and collections should be maintained.
One major addition will be that for the first time we will have the full Austalk collection on Alveo. We’ve been working on finalising this collection for some time and this is the first opportunity we’ve had to get the entire collection ingested. When the server returns you should see over 850,000 items in the Austalk collection.