Normalisation of paired-end data

Expression of genes varies considerably, and reads corresponding to highly expressed genes are over-represented in RNA-Seq datasets. The excessive reads do not improve transcript assembly and some sort of a digital normalisation can reduce memory requirements and decrease the assembly time. Trinity Insilico Normalization is a part of Trinity package (link) and it is available on Galaxy-qld.

Normalisation of single-end data is fairly straightforward, but processing of paired-end reads with default settings produces different number of forward and reverse reads. To avoid this, set “process paired reads by averaging stats between pairs and retaining linking info” option to Yes. It is good to run FastQC on the output files and check number of sequences in file produced by Trinity Read Normalization tool – see “normalisation of paired-end reads” history published onĀ Galaxy-qld.