Users of Galaxy-qld kept the server very busy in the last couple weeks, with many queued jobs, especially during working hours, in time of a high user activity. The graph below shows user activity on Galaxy-qld on May 25.
The time is GMT / UTC; add 10 hours for Australian Eastern Standard Time (AEST). The user activity goes up in the morning, around 10 AM AEST, and declines after midnight. During day time users submit a new job every minute.
Majority of submitted jobs are completed within less than two hours, as shown on next graph. However, a small number of jobs run for long time, sometimes for several days.
Cufflinks and Cuffdiff jobs are often run for a very long time. Galaxy-qld offers StringTie, a faster alternative to Cufflinks. Tophat jobs can run for long time, especially with big datasets. Galaxy-qld provides HiSAT2 and precomputed indices for several genomes. HiSAT2 is incredibly fast. Assembly jobs usually require significant time. SPAdes and VelvetOptimiser sometimes stuck and have to be terminated manually. Very often aligner problems are caused by bad data, such as different number of reads in paired datasets, different order of reads in paired data, or excessive presence of nucleotides with low quality. We recommend Trimmomatic for read trimming, as it preserves a proper read pairing in the output files.
To ease the constrains we increased the number of worker nodes on Galaxy-qld and changed CPU allocation for some jobs, as well as number of concurrent jobs per user. The user policy will be modified to provide a better experience for all users, but we can do better with cooperation from our users:
– do not run a new analysis on many genome-scale datasets. Develop your analysis with a small dataset, if possible.
– do not delete and resubmit queued jobs. Usually jobs are queued because there is no spare capacity on the server, or users exceed the limits on number of concurrent jobs. The submitted jobs will be completed eventually.
– use faster tools, such as HiSAT2 and StringTie.