Good user practice for Galaxy-qld

This page is aimed for Galaxy-qld users. It describes behaviour we like to see on the server.

Register with your institutional email. We do not approve multiple registrations. If you need a bigger quota, contact the server admins. Accounts with non-valid email addresses can be deleted. An address is deemed non-valid if emails are bounced.

Read GVL_FAQ page. It might save you a lot of time.

Specify a proper quality score encoding for FASTQ files. By default, Galaxy assigns fastq datatype  to all FASTQ files during upload. Many tools do not recognise fastq datatype because it does not contain information about the quality score encoding. Currently the offset +33 is the most common standard. It corresponds to fastqsanger datatype in Galaxy. The datatype fastqillumina is used for the old Illumina standard that was abolished by the company several years ago.

Do not upload all your data to the server after your registration, as you may use all your disk quota. Start with one or two samples, select a small fraction of your reads / data and develop your analytical workflow. Create a Galaxy workflow and use it on full-size datasets. You’ll save a lot of time with this approach.

Save your results on an external storage. Galaxy-qld is not a data storage sever. It does not have an external backup at the moment.

Upload big datasets with ftp. Not familiar with ftp? Read instructions on GVL_FAQ page.

Galaxy-qld has restrictions on number of concurrent jobs per user:

  • Registered users can run up to 12 concurrent jobs. Upload is a job.
    Non-registered users can run only one job a time
  • Limit for specific jobs:
    group1: Trimmomatic – 8 concurrent jobs
    group2: old Bowtie, Cuffmerge, Cufflinks, local Blast search – 6 concurrent jobs
    group3: BWA, BWA_MEM, bowtie2, tophat2, Cuffdiff, RSEM calculate expression, gatk2_unified_genotyper, gatk2_haplotype_caller, macs2.1 – 6 concurrent jobs
    group4: RNA_STAR – 2 concurrent jobs
    group5: Trinity, Trinity read normalisation, VelvetOptimiser, SPAdes – 2 concurrent jobs

Explanation for the second section. A user can run only the specified number of jobs for tools in each group, e.g., two BWA and two tophat2 jobs. Other submitted jobs from this group, such as bowtie2, will be queued until completion of active jobs. Jobs from other groups can be run in parallel, e.g., a user can have two BWA, two tophat2 and six Trimmomatic jobs. In the latter example the total number of active jobs is 12, which is equal to the limit on concurrent jobs, hence all other submitted jobs will be queued until completion of active jobs.

Read the error report for failed jobs. In many cases the job failure is caused by user errors. Note that the output datasets of some failed jobs can be labelled green, but contain no data. In this situation the error report can be reached from info icon > stderr.

If you see jobs queueing for some time, do not submit new jobs. Email to Galaxy-qld admins if the problem persists for over two hours for simple jobs such as filtering.

NCBI Blast+ search is set to handle small sets of query sequences, in first 1000s. Do not use it with big datasets.

Use VelvetOptimizer with the default k-mer range, 55 to 69. Do not use small k-mers, as it can run out of memory.

Advertisements