Galaxy seminar at UQ

Come to our talk about Galaxy-qld at the Centre for Digital Scholarship in the UQ Library. The seminar is essentially an introduction to analysis of high throughput sequencing data using Galaxy platform. Galaxy is user-freindly web-based platform for computationally intensive data analysis. It does not require IT knowledge or programming skills. The talk might be useful for biologists not familiar with high throughout sequencing methods but interested in analysis of nextGen sequencing data.

9.30am – 10.30am, Thursday 8 February 2018

St Lucia campus
Duhig Building (2)
Centre for Digital Scholarship (Level 5)
eZone (Room 501) – i.e. Level 5 above Merlot Café

Click here to see details online.

Citing Galaxy-qld

We compiled a CiteULike library with 14 papers citing Galaxy-qld. Google Scholar was queried with “galaxy-qld” and the results were manually filtered to exclude incorrectly assigned papers. PhD theses were also excluded from the library.

Some Galaxy-qld users cited the GVL paper Afgan at al 2015 without the link to Galaxy-qld. These papers are available in the Genomics Virtual Lab library with tag “galaxy-qld“.

If you publish results obtained at Galaxy-qld, please cite Afgan at al 2015 paper and provide link to Galaxy-qld.

Storage, again

Galaxy-qld storage is nearly full at the moment, plus we observe an elevated user activity including upload of new data. We cannot increase the data storage capacity right now, but we trying to solve this issue. In meantime we are chasing non-active users and power users with big histories loaded with intermediate files such as SAMs. We want to make sure that the server will run smoothly over the Christmas time.

Galaxy-qld was designed for data analysis. It does not have capacity to store user data for long time.

Accounts created with non-valid emails will be deleted.

Files stored on the server for more than six months might be deleted.

Trinotate is available on Galaxy-qld

Trinotate, a tool for transcriptome annotation, is added to Galaxy-qld. We also created a public Blastp database for UniProtKB / Swiss-Prot release 2017_11. The server has Generate gene to transcript map [for Trinity assembly] and Transdecoder. Outputs of these tools are used by Trinotate.

Trinotate uses multiple Blast hits in transcript annotations, including multiple hits for the same protein sequence. Users may want to tweak the default Blastp output options and reduce number of reported hits from UniProtKB / Swiss-Prot database. A single best hit might be sufficient for annotation.

Upgrade of the GVL RStudio server

The GVL RStudio server had a major upgrade last week. Derek deployed a new GVL VM and upgraded RStudio to version 1.1.383. Security is improved by switching to https. The access link is changed to:

All user accounts were moved to the new server. In case of any issues with the updated server, contact Igor at


FTP failure for new users

Users registered on Galaxy-qld during last few days may experience problems with ftp uploads. If you cannot connect to Galaxy-qld via ftp/ftp client with error: “530 Login incorrect. Login failed.”, change your Galaxy password. This should resolve the problem.

Users who recently changed the Galaxy password may experience the same issue. Changing the password again should help.

RNA-Seq workshop at UQ

We run RNA-Seq in Galaxy workshop at UQ on November 15-16, 2017. Registration is essential. The participants will use Galaxy-tut server.

We will do the following:
data upload by ftp
GVL Basic RNA-Seq tutorial
GVL Advanced RNA-Seq tutorial
Galaxy workflows.

Slides from the workshop are available as pdfs:
Introduction to NGS
RNA-Seq in Galaxy.

There will be a limited user support for Galaxy-qld during the workshop.

Microbial genomics workshop at UQCCR

On November 4, 2017 we are holding a Galaxy training at UQCCR, Herston, for participants of the Practical Microbial Genomics Workshop. The training  is based on Public data → assembly, annotation, MLST tutorial created by Melbourne Bioinformatics.

Slides for the workshop: pdf

Because of time constrains we will use files from a data library available on the server. Instruction for data import: pdf

RNA_STAR index for TAIR10 assembly

On request from our users we added RNA_STAR index for ‘TAIR10’ assembly (see this post for explanation) to Galaxy-qld.

We noticed that with the default settings RNA_STAR cannot map majority of reads in some RNA-Seq datasets from Arabidopsis. Here is an extract from the log file:

           Uniquely mapped reads % | 3.27%
% of reads mapped to multiple loci | 10.06%
    % of reads unmapped: too short | 86.64%

Read a detailed explanation of ‘too short’ classification from Alexander Dobin.

Proportion of mapped reads can be increased by modification of alignment settings. Note that the procedure is described for RNA STAR Gapped-read mapper for RNA-seq data (Galaxy Version 2.4.0d-2). For additional information check relevant RNA_STAR threads, such as this one.

Set Would you like to set output parameters (formatting and filtering)? to Yes.

Set Would you like to set additional output parameters (formatting and filtering)? to Yes.

Reduce the default 0.66 value for the following filter options:
Minimum alignment score, normalized to read length
Minimum number of matched bases, normalized to read length (–outFilterMatchNminOverLread)
(can be 0)

Set Other parameters (seed, alignment, and chimeric alignment) to Yes

Set Would you like to set alignment parameters? to Yes

Reduce value for Minimum mapped length for a read mate that is spliced, normalized to mate length (–alignSplicedMateMapLminOverLmate) from the default 0.66 to something smaller.

Inspect the alignment, just to make sure you are happy with mapping.