When we set out to build Alveo the aim was always that it should be a repository for new collections contributed by researchers; however, the initial impetus was to get a number of older collections ingested and build the platform capabilities. New collections were added to Alveo via a back-end process that only the developers could run.
We have since worked on adding the hooks into the API to allow new collections, items and documents to be added to the Alveo repository. This extended API has now been deployed on the main system and we have extended the pyalveo library to allow scripts to be written that add new data. I recently used this facility to add the first contributed collection to Alveo: a collection of children’s speech data. This blog post describes the script that I wrote to do this by way of a bit of a tutorial on the process. Continue reading