Galaxy-qld is linked to the Australian GenomeSpace

GenomeSpace is a data-centric environment with user-friendly web interface. For some time Galaxy-qld was connected to the main GenomeSpace server in the US, but since yesterday it is linked to the Australian GenomeSpace server. The Australian GenomeSpace server can use Object Store containers through NeCTAR Cloud, an attractive option of cloud data storage for Australian researchers.  A tutorial for GenomeSpace is available on the GVL website.

The Australian GenomeSpace server does not share registration with the GenomeSpace server in the US. It means any user of the US server wishing to use the Australian GenomeSpace have to register on the Australian server. Existing users of GenomeSpace tools on Galaxy-qld need to change the GenomeSpace OpenID in User > Preferences > Manage OpenIDs. Please delete the existing GenomeSpace OpenID from the US server and add a new GenomeSpace OpenID. The new GenomeSpace OpenID will be from the Australian server. This have to be done after registration on the Australian GenomeSpace server.

Data backup

Galaxy-qld does not have an external backup. The server is provided for data analysis, not for file storage. We recommend users save their results to an external storage as soon as convenient and delete unneeded datasets from the server.

Galaxy provides several options for data download. All datasets can be downloaded to a users computer by clicking on the Save icon. Send Data > GenomeSpace Exporter tool can be used to save the results to external storage. GenomeSpace can manage online storage such as Dropbox, so files can be sent directly to Dropbox.

Users can export an entire history to a file and download the archive. This option comes with several caveats. Archiving of big histories can take a long time. Some archiving jobs run for days. The resulting files can be very big, in range of 10s or 100s GB. Hint: delete and purge all unneeded files before archiving. The histories can be uploaded to Galaxy only via a link. Try this option on a small history, to check the usability of this approach.

Individual datasets can be downloaded to a server on the command line with wget or curl commands (for details check GVL_FAQ). Note that BAM files require a special approach: left mouse click on the Save icon provides an access to alignment and index files, where links can be copied with a right mouse click.

Moving data between Galaxy instances

Data can be copied directly from one Galaxy server to another. Users have several options. For individual datasets:

  1. Click on name of file to expand the info.
  2. Right click on Save icon > Copy Link. Note that for BAM files the Save (Download) icon provides access to both alignment (BAM) and index (BAI) files. For BAM files use left mouse click to access BAM and index files. Move the cursor over the BAM file and use the right click to copy the link.
  3. On another Galaxy server: go to upload menu > Paste / Fetch Data and paste the link. Select attributes, such as genome assembly, if required. Hit the Start button.

Archived histories can be copied between Galaxy servers, with some caveats. First, there is no obvious notification for users about completion of the archiving. Hint for Galaxy-qld users: email Igor Makunin to get a notification if you export a big history to a file. Second, it takes long time to archive big histories on Galaxy-qld. Generally the archiving tool adds ~2 GB on per hour for histories exported as a file. Third, the size of archive might be an issue, especially for users with big disk allocation.

The procedure for histories:

  1. Keep only datasets you want to save. Delete and purge all unneeded files. Alternatively, copy valuable datasets into a new history
  2. History menu > Export to File
  3. Copy the link to the file from the middle (working) window. It will be used later.
  4. Make the history Accessible by clicking on the corresponding link in the middle window.
  5. After completion of archiving go to another Galaxy server. History menu > Import from File. The new history will be created. It takes some time, especially for big histories.
  6. To access the imported history, in History menu > Saved Histories and select the new history imported from archive: name.

User data can be moved through an external storage with GenomeSpace if both Galaxy instances are connected to the same GenomeSpace server, and you have a sufficient storage allocation linked to GenomeSpace. In this situation users also get benefits of a data backup.

Getting started with GenomeSpace

In order to use GenomeSpace tools on Galaxy-qld users have to register on both servers. Currently Galaxy-qld is connected to the main GenomeSpace server. Once the registrations were completed, add GenomeSpace OpenID to your account on Galaxy-qld. Log in to GenomeSpace, in a new tub log in to Galaxy-qld, go to User (top Galaxy menu) > Preferences > Manage open IDs and select GenomeSpace from the pull-down menu. Click Login button. The GenomeSpace ID will appear  among OpenIDs linked to your account. Make sure it points to the main GenomeSpace server gsui.genomespace.org. Now GenomeSpace Importer and Exporter can recognise your directory on the GenomeSpace server.

GenomeSpace at Galaxy-qld

GenomeSpace is a data-centric environment that brings together data and bioinformatics tools in a user-friendly manner. Genomics Virtual Lab provides an Australian version of the software, GVL GenomeSpace. GenomeSpace can handle different types of online storage including Dropbox. Galaxy-qld is currently connected to the main GenomeSpace server. The server provides 30 GB of disk space per registered user. One of the advantages  of GenomeSpace is ability to use GenomeSpace Exporter tool in Galaxy workflows and automatically export the results to an external storage. We recommend the following procedure for Galaxy-qld:

  • use a small dataset(s) to walk through steps of your analysis in Galaxy
  • include GenomeSpace Exporter tool
  • extract Galaxy workflow from the History menu
  • edit the workflow and change Choose Target Directory and Filename options for GenomeSpace Exporter to Set at Runtime
  • run the workflow on your samples and select the output directory and filename for GenomeSpace Exporter step.