QFAB workshop: Variant detection using Galaxy

Another Galaxy workshop from QFAB Bioinformatics is scheduled on October 11-12, 2017.

Title: Variant detection using Galaxy

Venue: Room 3.141, Queensland Bioscience Precinct, The University of Queensland, St Lucia
Start at Wed, 11/10/2017, 09:00, end at Thu, 12/10/2017 – 12:30.

Registration is essential.

Presentations from the workshop (pdf):
Introduction to Galaxy
Variant calling using Galaxy platform
Galaxy workflows

Instruction for the Galaxy workflow exercise: pdf file or Word document.

Advertisements

RNA_STAR index for TAIR10 assembly

On request from our users we added RNA_STAR index for ‘TAIR10’ assembly (see this post for explanation) to Galaxy-qld.

We noticed that with the default settings RNA_STAR cannot map majority of reads in some RNA-Seq datasets from Arabidopsis. Here is an extract from the log file:

           Uniquely mapped reads % | 3.27%
% of reads mapped to multiple loci | 10.06%
    % of reads unmapped: too short | 86.64%

Read a detailed explanation of ‘too short’ classification from Alexander Dobin.

Proportion of mapped reads can be increased by modification of alignment settings. Note that the procedure is described for RNA STAR Gapped-read mapper for RNA-seq data (Galaxy Version 2.4.0d-2). For additional information check relevant RNA_STAR threads, such as this one.

Set Would you like to set output parameters (formatting and filtering)? to Yes.

Set Would you like to set additional output parameters (formatting and filtering)? to Yes.

Reduce the default 0.66 value for the following filter options:
Minimum alignment score, normalized to read length
 (–outFilterScoreMinOverLread)
Minimum number of matched bases, normalized to read length (–outFilterMatchNminOverLread)
(can be 0)

Set Other parameters (seed, alignment, and chimeric alignment) to Yes

Set Would you like to set alignment parameters? to Yes

Reduce value for Minimum mapped length for a read mate that is spliced, normalized to mate length (–alignSplicedMateMapLminOverLmate) from the default 0.66 to something smaller.

Inspect the alignment, just to make sure you are happy with mapping.

Disruption to Galaxy-qld service on Monday, September 11

Our IT provider requested a temporary shutdown of Galaxy-qld on September 11, 2017, Monday, around 9 am Brisbane time, to fix a fault in hardware. We understand the repair may take hours, but exact duration is unknown. Updates on the situation are available through the GVL-Qld Twitter account @GVL_QLD.

Galaxy-qld will not accept new jobs since September 9.

The event will not affect user data.

UPDATE. September 11, 3:05 pm. Galaxy-qld is back online. Initial tests indicate the server is fully functional.

Workshop: RNA-Seq analysis using Galaxy

QFAB announced a new Galaxy workshop on RNA-Seq analysis.

When: Wed, 13/09/2017 – 09:00 to Thu, 14/09/2017 – 12:30.

Where: MultiMedia Room 3.141, Queensland Bioscience Precinct (building 80), The University of Queensland, St Lucia.

Cost: $25. Registration is essential.

Participants need to bring a wi-fi-enabled laptop capable to eduroam connection. The room is equipped with power points for every participant.

Presentations from the workshop (pdf):
Introduction to Galaxy
RNA-Seq with Galaxy
Galaxy workflow.

Instructions for exercises:
Data upload by ftp
Galaxy workflow.

Open file in a new tab by clicking on the link above. Switch to the new tab and download the file on your computer.

Arabidopsis thaliana resources on Galaxy-qld

Recently a new annotation of the Arabidopsis thaliana genes, Araport11, is added to Arabidopsis thaliana gene annotations data library on Galaxy-qld. The dataset was imported from ARAPORT, and modified for compatibility with the existing Arabidopsis assemblies. This post provides an overview of A. thaliana resources on Galaxy-qld.

The very first version of Galaxy-qld had TAIR9 assembly represented by five chromosomes, with the following contig names: chr1, chr2, chr3, chr4 and chr5. It does not have the mitochondrial and/or chloroplast genomes.

Later on request from our users we added the TAIR10 gene annotation into Arabidopsis thaliana gene annotations data library. This annotation includes genes from Mt and Pt. It uses just numbers (1, 2, 3, 4, 5) for chromosome names. The TAIR10 genomic sequence is identical to TAIR9 (link). To provide our users with greater flexibility we added TAIR10 aligner indices to Galaxy-qld. TAIR10 assembly contains the following contigs: 1, 2, 3, 4, 5, Mt and Pt.

The Araport11 gene annotation is based on TAIR10 genome assembly (link) which is identical to the TAIR9 assembly. The  original annotation comes with the following contig names: Chr1, Chr2, Chr3, Chr4, Chr5, ChrC, ChrM. [no comments on standard nomenclature here] To make the Araport11 annotation compatible with the TAIR9 assembly available on Galaxy-qld we replaced ‘Chr’ with ‘chr’. To make it compatible with the TAIR10, we removed ‘Chr’ from the contig names, replaced ChrC and ChrM with Pt and Mt, respectively, and sorted records in the same order as in the TAIR10 assembly: 1, 2, 3, 4, 5, Mt, Pt. The modified annotation is available in Arabidopsis thaliana gene annotations data library under Araport11_GFF3_genes_transposons.201606.modified.gtf name.

Normalisation of paired-end data

Expression of genes varies considerably, and reads corresponding to highly expressed genes are over-represented in RNA-Seq datasets. The excessive reads do not improve transcript assembly and some sort of a digital normalisation can reduce memory requirements and decrease the assembly time. Trinity Insilico Normalization is a part of Trinity package (link) and it is available on Galaxy-qld.

Normalisation of single-end data is fairly straightforward, but processing of paired-end reads with default settings produces different number of forward and reverse reads. To avoid this, set “process paired reads by averaging stats between pairs and retaining linking info” option to Yes. It is good to run FastQC on the output files and check number of sequences in file produced by Trinity Read Normalization tool – see “normalisation of paired-end reads” history published on Galaxy-qld.