Recently a new annotation of the Arabidopsis thaliana genes, Araport11, is added to Arabidopsis thaliana gene annotations data library on Galaxy-qld. The dataset was imported from ARAPORT, and modified for compatibility with the existing Arabidopsis assemblies. This post provides an overview of A. thaliana resources on Galaxy-qld.
The very first version of Galaxy-qld had TAIR9 assembly represented by five chromosomes, with the following contig names: chr1, chr2, chr3, chr4 and chr5. It does not have the mitochondrial and/or chloroplast genomes.
Later on request from our users we added the TAIR10 gene annotation into Arabidopsis thaliana gene annotations data library. This annotation includes genes from Mt and Pt. It uses just numbers (1, 2, 3, 4, 5) for chromosome names. The TAIR10 genomic sequence is identical to TAIR9 (link). To provide our users with greater flexibility we added TAIR10 aligner indices to Galaxy-qld. TAIR10 assembly contains the following contigs: 1, 2, 3, 4, 5, Mt and Pt.
The Araport11 gene annotation is based on TAIR10 genome assembly (link) which is identical to the TAIR9 assembly. The original annotation comes with the following contig names: Chr1, Chr2, Chr3, Chr4, Chr5, ChrC, ChrM. [no comments on standard nomenclature here] To make the Araport11 annotation compatible with the TAIR9 assembly available on Galaxy-qld we replaced ‘Chr’ with ‘chr’. To make it compatible with the TAIR10, we removed ‘Chr’ from the contig names, replaced ChrC and ChrM with Pt and Mt, respectively, and sorted records in the same order as in the TAIR10 assembly: 1, 2, 3, 4, 5, Mt, Pt. The modified annotation is available in Arabidopsis thaliana gene annotations data library under Araport11_GFF3_genes_transposons.201606.modified.gtf name.