High demand for storage on Galaxy-qld

Number of users on Galaxy-qld is growing rapidly, with over 130 new people registered in March. As a result, the server hosts more jobs, and we see a high demand for storage. At the start of the week the storage utilisation was close to 90%, and we asked our users to delete old and unneeded datasets. We have a very good response from active users, but people with dormant / inactive accounts are less cooperative.

Galaxy-qld does not have capacity to store user data for long time. The server is designed for data analysis.

Please download the results as soon as convenient and delete files on Galaxy-qld.

Do not store temporary files or SAM files.

We see users storing original FASTQ files and reads after FASTQ Groomer with the only difference in metadata (fastq vs fastqsanger datatype), files with reads trimmed separately by quality and length, etc. Many aligners keep all reads in alignment, both aligned and unaligned, so users have multiple copies of the same data in different formats.

Introduction to Galaxy workshop at UQ

We run Introduction to Galaxy workshop at IMB, UQ on April 12, 2017, from 2 pm to 4 pm. The venue: Multimedia room 3.141 at QBP. The event is organised for IMB postgraduates and it is fully booked. The workshop is all about data in Galaxy: upload by ftp or ftp client software, move data between histories and Galaxy servers, data import from CloudStor and data libraries.

The slides from the workshop are available for download (link to pdf).

UPDATE Several Firefox Mac users have problems with right-click during the workshop. With external third party mouse put mouse cursor over Save icon; press and keep right button depressed and navigate to the appropriate line in the displayed menu, such as Save Link Location. Release of the right button puts the link into buffer. With touchpad use two fingers touch to display the menu under Save icon. Navigate to the Save Link Location link with two fingers (do not release fingers from the touchpad).

Transcript quantification with Salmon

Recently a very fast tool for transcript quantification from RNA-Seq data, Salmon, was added to Galaxy-qld. The tool counts reads corresponding to transcripts and returns Transcripts per Million (TPM) and estimated number of reads. Salmon is incredibly fast, because it uses hashing / k-mers for estimation of transcript expression instead of traditional read mapping. It can estimate expression on transcript and gene level, if provided with a gene-to-transcript mapping file.

Example of transcript quantification with the new tool is available in Counting reads with Salmon history published on Galaxy-qld. It seems the installed version has a minor bug: in the Gene Quantification output is records transcript names instead of gene names.

FASTQ interlacer and fancy FASTQ format

This is a follow up of the previous post on non-standard FASTQ data. It seems that FASTQ files with different names in the first and third lines are quite common. Unfortunately some tools, such as FASTQ de-interlacer or FASTQ splitter, fail on such FASTQ datasets. Even worse, FASTQ to Tabular also fails on some FASTQ files. As a result, such files files cannot be fixed through FASTQ to Tabular conversion, as described in the previous post. In this situation Regex Find And Replace can be used to replace names in the third line. The simples solution is to have only ‘+’ character in the third line.

The  Regex Find And Replace tool uses Python regular expression, where ‘.’ (dot symbol) is used in expression with meaning ‘any character’, and ‘*’ is used for repetition of a pattern – see Python documentation for additional information.

For example, if names in the third line looks like +SRR12345.xxx length=75, and quality is encoded with +33 offset, than the following expression in Regex Find And Replace will remove all names:

RefExp.png

 

 

Seminar on Galaxy-qld at Griffith Uni

Come to our talk on Galaxy-qld at Griffith University. The seminar is arranged by Amanda Miotto and the Hacky Hour Crew – Griffith.

When: 6th March 2017,  2pm-3pm.
Location: Room 0.05, N13 Environment 2, Nathan Campus, Griffith University.

Title: Galaxy-qld, a local server for genomic research

Galaxy-qld is a public server for analysis of high-throughput sequencing data supported by the Research Computing Centre, UQ and the Queensland Cyber Infrastructure Foundation. Galaxy is a web-based platform for computationally intensive data analysis. It does not require knowledge of IT or programming skills. With a simple registration Galaxy-qld provides an instant access to substantial computational resources and an ample amount of the disk space for data manipulation. Bioinformatic tools available on the server cover popular topics such as RNA-Seq, ChIP-Seq, genome resequencing and variant identification, de novo transcriptome and genome assembly. The talk provides an overview of the latest development of Galaxy-qld and the related resources for genomic research maintained by RCC.

The seminar is aimed for a broad range of biologists interested in analysis of high-throughput sequencing data and does not require knowledge of genomics or programming.

Presentation for the seminar is available as pdf file.