Paired end alignment using Bpipe

This is a continuation of my post on getting started with Bpipe. In this post I followed the paired end alignment example from the documentation, with a few adjustments. This pipeline generates a random reference sequence, generates paired end reads from the reference, and aligns these reads back to the reference using BWA.

Here's the pipeline file (pipeline.groovy):

SEED=31
REF_SIZE=1000000
REF="test.fa"
READ_NO=10000
READ_LEN=100
INNER_DIST=400

random_ref = {
   // Usage: generate_random_seq.pl <bp> <seed>
   exec "generate_random_seq.pl $REF_SIZE $SEED > $REF"
}

index_ref = {
   exec "bwa index $REF"
}

random_read = {
   // Usage: random_paired_end.pl <infile.fa> <read length> <number of pairs> <inner mate distance> <seed>
   exec "random_paired_end.pl $REF $READ_LEN $READ_NO $INNER_DIST $SEED"
}

bwa_align = {
   from(glob("*.fq")) {
      transform("sai","sai","bam"){
         multi "bwa aln $REF $input1.fq > $output1",
               "bwa aln $REF $input2.fq > $output2"

         exec """
            bwa sampe $REF $output1 $output2 $input1.fq $input2.fq |
            samtools view -bS - |
            samtools sort - $output.bam.prefix
         """
      }
   }
}

Bpipe.run { random_ref + index_ref + random_read + bwa_align }

Here's the generate_random_seq.pl script:

https://gist.github.com/davetang/3e9c8ae57eded71c84de

Here's the random_paired_end.pl script:

https://gist.github.com/davetang/fce12a99e514938b5725

Finally here's the Makefile:

all:
        bpipe run pipeline.groovy

clean:
        rm -rf commandlog.txt .bpipe *.fq *.fa* *.sai *.bam

To run this pipeline, download the two Perl scripts, create the Makefile, and the pipeline.groovy as above, and put them all into the same directory and type:

make
#type make clean to delete all the generated files

[snipped]
======================================== Pipeline Succeeded ========================================
11:02:01 MSG:  Finished at Thu Jun 04 11:02:01 WST 2015
11:02:01 MSG:  Outputs are: 
                l100_n10000_d400_31_1.sai
                l100_n10000_d400_31_2.sai
                l100_n10000_d400_31_1.bam

#the final alignment file is the BAM file
#and should have the same md5 output
#md5sum on Linux
md5 l100_n10000_d400_31_1.bam
MD5 (l100_n10000_d400_31_1.bam) = a77b2b726e64210a6b10baa9fe377220

Summary

The pipeline is fairly straight-forward; the purpose of this post was to create a working example of a Bpipe pipeline for paired end alignment as I could not follow the paired end example on the Bpipe documentation. There will be more posts in the near future on using Bpipe to build different types of bioinformatics pipelines.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.