Part 2: Galaxy

This is Part 2 of a four-part tutorial in using Alveo. You can find the outline of the tutorial, and links to the other three parts on the Tutorials page. This Part will cover the basics of Galaxy, including importing Alveo lists into Galaxy, and using histories.

Introduction to Galaxy

Galaxy is an online workbench originally designed for bioinformatics, that allows you to run data through tools stored on the cloud. Galaxy is used by a number of other disciplines such as astronomy and economics, as it is a very powerful workflow engine that allows researchers to define their own data manipulation and analytical tools. The version of Galaxy that we use in Alveo contains tools that have been specifically designed for language, speech, video or text analysis.

You can think of Alveo as a scientific laboratory, with data sitting in filing cabinets and drawers. Galaxy is another part of the laboratory that contains the equipment and machinery that you use to experiment on your data. In order to experiment, you need to find the right data first, and then carry it over to the workbench where all the equipment is located.

This part of the tutorial involves using your Alveo data lists in Galaxy, and using the Galaxy tools to concatenate and rename files, as well as using histories to track your steps.

Using Alveo lists in Galaxy

In order to use data files from Alveo in the Galaxy tools, you must have them in a list. Once you have created lists (such as the three lists we created in the last module), select the Item Lists tab, and when selecting a list from the sidebar on the left, you will have an option for additional list actions. Here you can clear all data files from a list, rename a list, or Use in Galaxy. Select this option.

The first time you attempt to use Galaxy, it will prompt you to create an account. This account is different from your account on Alveo. After you have given Galaxy an email address, a password and an account name, you will be logged in. You may then have to go back to Alveo and select Use in Galaxy again.

This time, you’ll be directed to the Galaxy / Alveo virtual lab, and the Alveo Data Importer will already be open, and some details (such as the Item List URL and the API key) will be automatically prefilled. The output name will also be prefilled; it will be named after the list name, with the date in parentheses. You can change this name to whatever you want.

You also have the option of only importing certain types of data. The lists we created in the first module only contain text data, so this option is not relevant right now. The other important option is whether to concatenate all text documents into one Galaxy document. Checking this will enable us to run analysis over all of the files in the lists as a single batch process. Make sure this option is selected, and click Execute.

When Galaxy performs a task, it runs in the background, allowing you to do other things. It lets you know the progress with an entry in the history viewer, located in the right-hand sidebar.

When a job has completed, it will turn green in the history. It will also produce more than just the single output file. You can click on the file names to see the details and, in the case of text documents, a preview of the text. In order to see the full text, click the eye icon. The Alveo importer tool has a number of output files. The first is a log of the import job, listing the files that were imported, their URL in Alveo, and a few other pieces of information. The imported files themselves are written into a new file called concatenated texts. If you didn’t select concatenate in the last step, you would see each individual text file listed in the history.

Click the eye icon on the concatenated texts file. This will display the concatenated text files in the main pane of the Galaxy workbench. Note that only the contained text from each file has been included; no filenames, titles or metadata. These are stored in the log file.

Next to the eye icon is a pencil icon, which is for editing the attributes of the dataset, including its name. The name ‘concatenated texts’ is not a particularly useful name, especially when we have more than one named the same thing. So use this menu to change the name of the data file to something more memorable such as Cooee_1780s.

Tutorial Task 4: Repeat these steps to import all of the lists you created in Module 1, so that you have Cooee_1780sCooee_1880s and Ace_1980s available in Galaxy for analysis.

You can also download files from the Galaxy workbench. This may be helpful later in the process after you have used Alveo and Galaxy to analyse and compute your data, if you want to download your results. To download, select a file in the history by clicking it, and notice that a few more options become available. One of these, represented by an icon of a floppy disk, is for downloading the file to your computer.

Histories

The history pane, on the right of the screen, lists all your previous steps and contains all output files from any jobs you run. You can also create more histories and rename them, and when you run workflows (which you will learn in Part 4) you can tell Galaxy to output the data to a new history. Different histories are a good way to manage different projects and data sources within Galaxy.

It is also possible to move or copy datasets from one history to another, or move them into a new history, by selecting Copy datasets from the gear icon in the history sidebar.

Tutorial Task 5: Copy the three concatenated lists that you imported from Alveo into a new history, and call it Text analysis.

Proceed to Part 3: Tools.