Learning about Snakemake

As promised two years ago, here's a short blog post on Snakemake. I have been using Bpipe to manage my workflows/pipelines but Snakemake has been mentioned to me on more than one occasion; in particular:

The main motive behind this post was that a colleague shared his work with me and he had used Snakemake to manage his analyses. To get started, here's the description of Snakemake from the documentation page:

Snakemake is a workflow management system that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern specification language in Python style. Snakemake workflows are essentially Python scripts extended by declarative code to define rules. Rules describe how to create output files from input files.

The rules follow that of GNU Make, which I wrote a blog post on. If you don't want to click on another post, here's the Makefile I created for that post:

all: data norm.png
data: raw.tsv norm.tsv

raw.tsv: get_data.pl
	get_data.pl > raw.tsv

norm.tsv: norm.R raw.tsv
	R CMD BATCH norm.R

norm.png: norm.R norm.tsv
	R CMD BATCH plot.R

clean:
	rm -rf *.tsv *.Rout *.png

Installing Snakemake

From the installation page, it states that:

Snakemake is available on PyPi as well as through Bioconda and also from source code. You can use one of the following ways for installing Snakemake.

PyPi is the Python Package Index; Conda is a cross-platform package and environment manager that installs, runs, and updates packages and their dependencies.; and Bioconda is a channel for the conda package manager specializing in bioinformatics software. I opted for the pip3 option on my MacBook Air (I used Homebrew to install Python3):

# install Homebrew and Python3
# ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
# brew update
# brew upgrade
# brew install python3
# pip3 install --upgrade pip setuptools wheel

pip3 install snakemake
...
Successfully built snakemake wrapt
Installing collected packages: wrapt, idna, certifi, urllib3, chardet, requests, snakemake
Successfully installed certifi-2017.4.17 chardet-3.0.4 idna-2.5 requests-2.18.1 snakemake-3.13.3 urllib3-1.21.1 wrapt-1.10.10

which snakemake
/usr/local/bin/snakemake

Getting started

Get started with a simple example:

# files in the root directory of this example
ls -1
Snakefile
test.txt

# contents of test.txt
cat test.txt 
5
3
7
2

# Snakemake file
cat Snakefile 
rule sort:
    input:
        "test.txt"
    output:
        "test.sorted.txt"
    shell:
        "sort -n {input} > {output}"

# run Snakemake
snakemake
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	sort
	1

rule sort:
    input: test.txt
    output: test.sorted.txt
    jobid: 0

Finished job 0.
1 of 1 steps (100%) done

# files in directory after running Snakemake
ls -1
Snakefile
test.sorted.txt
test.txt

# contents of test.sorted.txt
cat test.sorted.txt 
2
3
5
7

What if I wanted to sort multiple text files?

ls -1
Snakefile
test1.txt
test2.txt
test3.txt

# following the example in the Snakemake presentation
cat Snakefile 
DATASETS = ["test1", "test2", "test3"]

rule all:
    input:
        expand("{dataset}.sorted.txt", dataset=DATASETS)

rule sort:
    input:
        "{dataset}.txt"
    output:
        "{dataset}.sorted.txt"
    shell:
        "sort -n {input} > {output}"

snakemake
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	all
	3	sort
	4

rule sort:
    input: test1.txt
    output: test1.sorted.txt
    jobid: 1
    wildcards: dataset=test1

Finished job 1.
1 of 4 steps (25%) done

rule sort:
    input: test3.txt
    output: test3.sorted.txt
    jobid: 2
    wildcards: dataset=test3

Finished job 2.
2 of 4 steps (50%) done

rule sort:
    input: test2.txt
    output: test2.sorted.txt
    jobid: 3
    wildcards: dataset=test2

Finished job 3.
3 of 4 steps (75%) done

localrule all:
    input: test1.sorted.txt, test2.sorted.txt, test3.sorted.txt
    jobid: 0

Finished job 0.
4 of 4 steps (100%) done

ls -1
Snakefile
test1.sorted.txt
test1.txt
test2.sorted.txt
test2.txt
test3.sorted.txt
test3.txt

Some useful commands

Taken from the Snakemake tutorial presentation (see Links).

# execute the workflow with target D1.sorted.txt
snakemake D1.sorted.txt

# execute the workflow without target: first rule defines target
snakemake

# dry-run
snakemake -n

# dry-run, print shell commands
snakemake -n -p

# dry-run, print execution reason for each job
snakemake -n -r

Summary

I've only gone through the Snakemake presentation and there are a lot of features provided by the software. The most important to me are parallelisation and logging capabilities but I guess these should be core features of any workflow building tool. For more information on other tools, check out this Biostars post.

Links

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *