Getting started with Bpipe

I have been using simple shell scripts for creating my bioinformatic pipelines. I define variables that can be used as parameter settings throughout the script, use some basic Unix tools for creating my output file names, and simply check the existence of files to see whether a step has been run or not. You can create a temporary file as a store for the results and only when a step has completed is this file moved to its final name; therefore, we can simply check the existence of a file to determine whether a step has completed or not. However, this approach is very basic.

I recently learned about Bpipe, which is a tool for running and managing bioinformatics pipelines. In their documentation they show how a pipeline implemented as a shell script can be simply transformed into Bpipe stages. The first thing you may notice is that the steps of the pipeline are now separated into different modules. If you go through the rest of the tutorial you can see the other features, such as defining variables, having Bpipe handle the input and output names, and having each step nicely logged.

Getting started

To get started we simply need to download the tarball and the binaries are simply in the bin directory:

wget http://download.bpipe.org/versions/bpipe-0.9.8.7.tar.gz
tar -xzf bpipe-0.9.8.7.tar.gz
ls bpipe-0.9.8.7/bin/bpipe
bpipe-0.9.8.7/bin/bpipe

Following the Hello World example, we create a file named hello.pipe:

cat hello.pipe
hello = {
  exec "echo Hello"
}
world = {
  exec "echo World"
}

Bpipe.run { hello + world }

To run the pipeline, which is simply two echo steps, we type:

bpipe hello.pipe
====================================================================================================
|                              Starting Pipeline at 2015-05-28 09:31                               |
====================================================================================================

=========================================== Stage hello ============================================
Hello

=========================================== Stage world ============================================
World

======================================== Pipeline Succeeded ========================================
09:31:25 MSG:  Finished at Thu May 28 09:31:25 WST 2015
/Users/dtang/bin/bpipe: line 724:  6809 Terminated: 15          ( tail -f $LOGFILE | sed -l "$TAIL_PATTERN" )

There is also a commandlog.txt file, which contains a log:

cat commandlog.txt 

####################################################################################################
# Starting pipeline at Thu May 28 09:31:25 WST 2015
# Input files:  null
# Output Log:  .bpipe/logs/6795.log
# Stage hello
echo Hello
# Stage world
echo World
# ################ Finished at Thu May 28 09:31:25 WST 2015 Duration = 0.681 seconds #################

The next example shows how you can use the Bpipe $input and $output variables. Firstly we need to create a test file:

#create test file
echo Bpipe > test.txt

#what's inside the second pipeline file
cat hello2.pipe 
hello = {
  exec "echo Hello | cat - $input > $output"
}
world = {
  exec "echo World | cat $input - > $output"
}
run { hello + world }

#running the pipeline
bpipe run hello2.pipe test.txt
====================================================================================================
|                              Starting Pipeline at 2015-05-28 09:38                               |
====================================================================================================

=========================================== Stage hello ============================================

=========================================== Stage world ============================================

======================================== Pipeline Succeeded ========================================
09:38:10 MSG:  Finished at Thu May 28 09:38:10 WST 2015
09:38:10 MSG:  Output is test.txt.hello.world
/Users/dtang/bin/bpipe: line 724:  7187 Terminated: 15          ( tail -f $LOGFILE | sed -l "$TAIL_PATTERN" )

#two output files are created
cat test.txt.hello
Hello
Bpipe

cat test.txt.hello.world
Hello
Bpipe
World

The output files have the steps appended to the file names. We can specify the file types to the $input and $output variables, to create files with the correct extension:

cat hello3.pipe 
hello = {
   exec "echo Hello | cat - $input.txt > $output.txt"
}

world = {
   exec "echo World | cat $input.txt - > $output.txt"
}

run { hello + world }

bpipe run hello3.pipe test.txt
====================================================================================================
|                              Starting Pipeline at 2015-05-28 09:47                               |
====================================================================================================

=========================================== Stage hello ============================================

=========================================== Stage world ============================================

======================================== Pipeline Succeeded ========================================
09:47:17 MSG:  Finished at Thu May 28 09:47:17 WST 2015
09:47:17 MSG:  Output is test.hello.world.txt
/Users/dtang/bin/bpipe: line 724:  7736 Terminated: 15          ( tail -f $LOGFILE | sed -l "$TAIL_PATTERN" )

cat test.hello.txt 
Hello
Bpipe

cat test.hello.world.txt 
Hello
Bpipe
World

Bpipe also allows annotations, that can be used to annotate each stage:

cat hello4.pipe 
@Filter("hello")
hello = {
   exec "echo Hello | cat - $input > $output"
}

@Filter("world")
world = {
   exec "echo World | cat $input - > $output"
}

run { hello + world }

bpipe run hello4.pipe test.txt
====================================================================================================
|                              Starting Pipeline at 2015-05-28 09:53                               |
====================================================================================================

=========================================== Stage hello ============================================

=========================================== Stage world ============================================

======================================== Pipeline Succeeded ========================================
09:53:57 MSG:  Finished at Thu May 28 09:53:57 WST 2015
09:53:57 MSG:  Output is test.hello.world.txt
/Users/dtang/bin/bpipe: line 720:  8121 Terminated: 15          ( tail -f $LOGFILE | sed -l "$TAIL_PATTERN" )

cat test.hello.txt 
Hello
Bpipe

cat test.hello.world.txt
Hello
Bpipe
World

Notice that extensions were not given to the $input and $output variables, but the output file names had the correct file extensions. The @Filter annotation, refers to filtering where by a file is modified by the type remains the same; since we started with a txt file, we end up with a txt file.

Summary

Bpipe has many other features, which are listed out at the main page of the documentation. I'm not very familiar with pipeline tools (since I've just been using shell scripts) but Bpipe seems like a good starting point. I tweeted about Bpipe a few days ago and Harold mentioned Snakemake to me; I'll definitely check that out too!

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
3 comments Add yours
  1. I am not sure I understand the utility of this program. If you want command logging in your bash script, just include `set -x` at the start and pipe stderr to a log file. If you want a log of the script itself, just include `cat $0 > scriptlog.txt`. If you want to check to if a step was already executed, then use `if [ ! -f output.file ]; then ; fi. If you want to log program versions, then run ` –version`. There are native bash solutions to most of the problems that bpipe seems to try to fix.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.