Author Archives: Steve Cassidy

Alveo Transcription Tool

We’re pleased to announce the latest tool in the Alveo stable for transcription of audio recordings.  The Alveo Transcription Tool is a browser based system that works with audio stored on the Alveo platform to facilitate time-aligned transcription.  The eventual goal is to support the entire workflow from uploading of recordings to transcription and then analysis of transcripts.  In this first release of the tool though we are looking for feedback on the user interface and automated segmentation.

As shown in the video, when you connect to the tool you will be asked to login with Alveo.  This works best if you’re already logged in to the Alveo application in your browser – if you are then you’ll just see a few screen refreshes and you’ll be logged in.  You can then download data from Alveo – this is just to get the list of item lists you have access to. The idea is that you would create an item list of the items you want to transcribe and use that to drive your workflow in this tool.

Once you select an item list you can choose the item to transcribe and then the audio file you want to look at.  You are then shown the transcription interface with a display of the audio waveform and a few controls.

A key feature of the tool is that it will automatically segment the audio for you to facilitate transcription.  Click the Autosegment button and the audio will be sent to the backend server for segmentation.  In this first version segmentation is by speech activity detection only but future versions will include a smarter speaker diarization system.

Segmentation will give you a set of speaker turns  to transcribe. You can then click on one and if Autoplay is selected, you’ll hear it play.  Transcribe what you hear, hit tab and keep going.

The transcription you create is saved in your browser’s local storage. This means that if you login again later, it will still be there for you to continue with, even if you shut down your browser and re-start it.

The current version of the tool allows you to download the transcript as a CSV file or as a JSON structure.  The next stage of development will be to add a backup server that will store your transcriptions in the cloud.  Beyond that we will develop a workflow to push transcriptions back to Alveo as an Annotation Contribution so that they are associated with the original recordings and available to other Alveo users.

If you want to test the transcription tool you’ll need an Alveo account. I’ve prepared a shared item list with some sample recordings that we’ve tested called transcription-sample. If you select this you can try transcribing these recordings to get a feel for the user interface. Note that this item list contains data from the Austalk, Mitchell & Delbridge and AVCOM collections – you’ll need to agree to the licence terms for each of these to be able to use the data.

Your feedback on any aspect of this is most welcome – please email me at

Allocating a DOI for a Collection

Recent work on Alveo now allows us to allocate a DOI for any collection held on the system.  A DOI (Digital Object Identifier) is intended to be a persistent digital identifier for an electronic resource; one that can be cited in your publications.

To be able to allocate a DOI we needed to have public pages corresponding to each collection in Alveo.  In the past, collection pages have only been visible to logged in users.   In addition to making these pages public, we have provided the option of adding a rich text description of the collection and adding attachments to the collection page – such as images or PDF files.  The result is that now, collection pages can act as the main `landing’ page for a collection and provide full documentation for future users.   Since these pages are visible without login, they will be indexed by search engines and should help users find your collections.

Once public pages were available, we established a procedure through Macquarie University Library to allocate a DOI for a collection.   The ultimate provider of the DOI is ANDS through their Cite My Data service.  Macquarie acts as a overseer to help ensure that the DOI is long-lasting.  Should the hosting of Alveo move to another institution in future, the management of the DOIs can also be transferred.

It is appropriate to issue a DOI for a collection on Alveo if the following conditions are met:

  • the collection is complete and you do not envisage it changing in future
  • Alveo is the main and definitive source for the data

The process to issue a DOI is manual at the moment – a collection owner can contact me (Steve Cassidy) to request a DOI.  I will then liaise with them to confirm the conditions above and that the appropriate meta-data is present in the collection before issuing a DOI.

Our first DOI has been created for the MAVA Collection by Vincent Aubanel 10.4227/139/59a4c21a896a3 which Vincent has now cited in this paper: Contribution of visual rhythmic information to speech perception in noise.

If you are the owner of a collection on Alveo and would like to take advantage of this facility please get in touch.

If your collection is not already on Alveo you could also get in touch, but watch this space for news about easier ways to get your data uploaded to Alveo.

AusTalk Updates

Astute users may have noticed a change to the AusTalk collection in Alveo in the last couple of days.  We are re-ingesting AusTalk into Alveo to correct some errors with the previous version of the data.  This means that we removed the old version and then re-loaded all of the data into Alveo.  As I write this there are 400,000 of the 800,000 items available; the remainder should load over the next day.

This new ingest will allow us to attach the annotation files to those items in AusTalk that have either been transcribed or annotated phonetically.  Once these are in place we’ll provide some pointers to finding and working with annotated data.

One of the errors we found with the data was that we had included some speakers that did not belong in the core AusTalk collection.  In some cases these were test speakers who should not have been published, but most of them were from a later accented English collection by Michael Wagner which used the AusTalk protocol to collect data from a different group of target speakers.

We will make the accented AusTalk data available as soon as we have that all in one place.  We also have the AusTalk Emotional speech collection from Julien Epps at UNSW in preparation.   Finally the video data associated with the main AusTalk collection will be made available as a separate collection on Alveo.


Alveo Services Restored

I’m pleased to report that the Alveo server is now fully restored and all services should be working again as normal.  AAF login is working again and password reset emails are now being delivered.

There is some work still in progress. In particular the Galaxy server will be updated soon with some more tools for manipulating speech data.  We have been building tools to support workflows involving forced-alignment with MAUS and formant tracking with the Emu wrassp toolkit.  These are now mostly working and we will deploy them as soon as we can.  The use of Galaxy for speech and language analysis is a new development and we are still working out the best way to build tools and chain them together.  When we have some tools available we’ll invite you to experiment and provide feedback so that we can hopefully build something that is generally useful to the community.


Server Status Update

An update on the new server deployment.  The Alveo repository is now re-installed on new infrastructure at NCI Canberra.   All collections are re-ingested and should be available as before but there are a couple of unresolved issues that we are still working on.

  • AAF logins are not yet working so you will need to login with a username/password if you have one
  • we’re not able to send mail from the server so you will not be able to get password reminders or create new accounts

Unfortunately, in combination these problems might block many users who previously used AAF login to access Alveo. We are working on both issues and hope to have them resolved next week.

The ingest of the full Austalk collection was interrupted at some point and so not all of the collection is present.   We will be re-ingesting this collection this weekend (19-20 Nov) so hopefully it will be fully available next week.

One new collection is now available, MAVA is a collection of Audio-Visual read speech from a single speaker collected by Vincent Aubanel from Western Sydney University.

I will post further updates as things change.

Alveo Server Outage

As of this morning (1st November) the Alveo server is offline. We are currently moving the server from its previous home at Intersect in Sydney to the facilities of NCI in Canberra.   We had hoped to have a seamless transition between the two services but unfortunately the new server is not quite ready.

We will bring Alveo back online as soon as possible.  All user accounts and collections should be maintained.

One major addition will be that for the first time we will have the full Austalk collection on Alveo.  We’ve been working on finalising this collection for some time and this is the first opportunity we’ve had to get the entire collection ingested.  When the server returns you should see over 850,000 items in the Austalk collection.

Uploading Data to Alveo

When we set out to build Alveo the aim was always that it should be a repository for new collections contributed by researchers; however, the initial impetus was to get a number of older collections ingested and build the platform capabilities.  New collections were added to Alveo via a back-end process that only the developers could run.

We have since worked on adding the hooks into the API to allow new collections, items and documents to be added to the Alveo repository.  This extended API has now been deployed on the main system and we have extended the pyalveo library to allow scripts to be written that add new data.   I recently used this facility to add the first contributed collection to Alveo: a collection of children’s speech data.  This blog post describes the script that I wrote to do this by way of a bit of a tutorial on the process. Continue reading

Report from SocioPhonAus 2016 Brisbane

I was invited to give a presentation on Alveo and Austalk at First workshop on Sociophonetic Variability in the English Varieties of Australia held at Griffith University in Brisbane in June.   The workshop, organised by Gerry Docherty and Janet Fletcher, was supported by the Centre of Excellence for the Dynamics of Language was attended by phoneticians from around the country with a keynote given by Prof. Jonathan Harrington who flew in from Munich.

Continue reading