Ray AssemblyΒΆ
Ray is a parallel software that computes de novo genome assemblies with next-generation sequencing data. Ray is written in C++ and can run in parallel on numerous interconnected computers using the message-passing interface (MPI) standard. See the Ray home page for more info.
Ray can be run by the following command using a kmer-length of 31. As our compute instance have multiple cores, we specify this in the `mpiexec -n 48 ` command to let Ray know it should use 48 parallel MPI processes:
cd ~/workdir/assembly/
qsub -cwd -pe multislot 48 -N ray -l mtc=1 -b y \
/usr/lib64/openmpi/bin/mpiexec -n 48 /vol/cmg/bin/Ray -k 31 -p read1.fq read2.fq -o ray_31
This will create the output directory ray_31 and the final contigs are located in ray_31/Contigs.fasta. Again, let’s get some basic statistics on the contigs:
getN50.pl -s 500 -f ray_31/Contigs.fasta
Now that you have run assemblies using Velvet, MEGAHIT, IDBA-UD and Ray, let’s have a quick look at the assembly statistics of all of them:
cd ~/workdir/assembly/
/vol/metagencourse/bin/get_assembly_stats.sh
Note
Most jobs above will be started on the compute cluster using the qsub
.
qstat
: check the status and JOBNUMBER of your jobsqdel JOBNUMBER
: delete job with job number JOBNUMBER
We usually submit the jobs to the cluster giving them a job name by using -N JOBNAME
.
This will create log-files named
JOBNAME.oJOBNUMBER
: standard output messages of the toolJOBNAME.eJOBNUMBER
: standard error messages of the tool
You can look into these files by typing e.g. less JOBNAME.oJOBNUMBER
(hit q
to quit)
or tail -f JOBNAME.oJOBNUMBER
(hit ^C
to quit).