Changes in user policy for assembly tools

Because of a high demand for assembly jobs from some Galaxy-qld users we now allow only one concurrent 16 CPUs job per account.

Tools assigned to 16 CPUs quota:
Trinity and Trinity read normalization
blastx, tblastn, tblastx.

Users can submit several 16 CPUs jobs. These jobs will be executed sequentially, depending on available resources.

Galaxy seminar at the Ecosciences Precinct on February 6

On February 6, 2018 we run a Galaxy seminar at the Ecosciences Precinct for people from the Department of Agriculture and Fisheries. The talk provides a broad overview of Galaxy platform and Galaxy-qld server and would be of interest to people interested in analysis of high throughput sequencing data. It will follow by a Galaxy demo session focussed on data upload using ftp and genome assembly.

Slides for the seminar are available for download from Dropbox (pdf).

Long-term data storage on Galaxy-qld

Galaxy-qld was designed for data analysis. It has a modest storage. Unfortunately, some users store their files for long time. The graph below shows distribution of big datasets based on date of creation.


The big datasets represent about a half of all data stored at the server. The graph demonstrates that Galaxy-qld users store TBs of data for over a year. In some cases people upload data from public repositories and keep the datasets on the server for months and even years without doing any analysis. The long-term data storage has negative impact on the server, as we have to turn down requests for extra quotas from active users.

We are going to clean-up accounts inactive for six months or more. Data stored without analysis will be deleted without warning. Non-active users will be asked to move their results to an external storage prior to a certain date. On this day data in their accounts will be deleted.

Galaxy seminar at UQ

Come to our talk about Galaxy-qld at the Centre for Digital Scholarship in the UQ Library. The seminar is essentially an introduction to analysis of high throughput sequencing data using Galaxy platform. Galaxy is user-freindly web-based platform for computationally intensive data analysis. It does not require IT knowledge or programming skills. The talk might be useful for biologists not familiar with high throughout sequencing methods but interested in analysis of nextGen sequencing data.

9.30am – 10.30am, Thursday 8 February 2018

St Lucia campus
Duhig Building (2)
Centre for Digital Scholarship (Level 5)
eZone (Room 501) – i.e. Level 5 above Merlot Café

Click here to see details online.

Citing Galaxy-qld

We compiled a CiteULike library with 14 papers citing Galaxy-qld. Google Scholar was queried with “galaxy-qld” and the results were manually filtered to exclude incorrectly assigned papers. PhD theses were also excluded from the library.

Some Galaxy-qld users cited the GVL paper Afgan at al 2015 without the link to Galaxy-qld. These papers are available in the Genomics Virtual Lab library with tag “galaxy-qld“.

If you publish results obtained at Galaxy-qld, please cite Afgan at al 2015 paper and provide link to Galaxy-qld.

Storage, again

Galaxy-qld storage is nearly full at the moment, plus we observe an elevated user activity including upload of new data. We cannot increase the data storage capacity right now, but we trying to solve this issue. In meantime we are chasing non-active users and power users with big histories loaded with intermediate files such as SAMs. We want to make sure that the server will run smoothly over the Christmas time.

Galaxy-qld was designed for data analysis. It does not have capacity to store user data for long time.

Accounts created with non-valid emails will be deleted.

Files stored on the server for more than six months might be deleted.

Trinotate is available on Galaxy-qld

Trinotate, a tool for transcriptome annotation, is added to Galaxy-qld. We also created a public Blastp database for UniProtKB / Swiss-Prot release 2017_11. The server has Generate gene to transcript map [for Trinity assembly] and Transdecoder. Outputs of these tools are used by Trinotate.

Trinotate uses multiple Blast hits in transcript annotations, including multiple hits for the same protein sequence. Users may want to tweak the default Blastp output options and reduce number of reported hits from UniProtKB / Swiss-Prot database. A single best hit might be sufficient for annotation.