Creating reproducible documentation

When I was first learning about SAMtools, I kept my notes in a Wiki. I would type the SAMtools commands in the terminal and copy and paste the output into my Wiki. It was a tedious task but it was a useful resource that I would refer back to frequently. The latest version of my SAMtools notes is now hosted on GitHub and can be viewed directly from the repo or using GitHub pages. This blog post is about how I created a simple workflow that created the reproducible documentation for SAMtools.

Firstly, I did not want to manually type SAMtools commands and copy the output to my documentation; this is not only tedious but also error prone. To overcome this, I chose to use R Markdown, which is an interactive notebook that can generate various output formats. R Markdown is not only used for R but also supports other languages so you can execute Python code or another supported language from your notebook. In my R Markdown document, I have code blocks that execute samtools using the bash shell. For example, the code block below will run samtools –help; I have included the engine.opts=’-l’, which will use my bash profile files.

```{bash engine.opts='-l'}
samtools --help
```

The YAML header in my R Markdown file will render a github_document, which generates a GitHub compatible markdown file (README.md) that is perfect for sharing documentation on GitHub.

---
title: "Learning the BAM format"
output: github_document
---

With my R Markdown document, I no longer have to manually run SAMtools, save the samtools output, and paste it into my documentation!

The next step is to ensure that I can generate the same README file. Initially I had tried to use Conda but it was difficult to set up across different operating systems. In the end I settled with using Docker and created an image that was used to generate the README and as long as this image is used, the same README file should be generated. I based my image from the R image provided by The Rocker Project and included various libraries commonly used for compiling tools as I plan to use this image for other tasks as well; the Dockerfile used to create the image is available in this repo.

Now, to create the same README file as I have on my GitHub repo, you can run the following commands assuming you have Docker installed and properly set up.

# clone repo
git clone https://github.com/davetang/learning_bam_file.git
cd learning_bam_file

# pull Docker image
docker pull davetang/r_build:4.1.0

# run image
docker run --rm -it -v $(pwd):/work davetang/r_build:4.1.0 /bin/bash

# create the README inside the Docker container
make

The make command will run the Makefile that will prepare the required data and all required tools to generate the README via the script create_readme.sh. This script executes R to process the R Markdown document to generate the README Markdown file, uses gh-md-toc to generate a table of contents for the README, and finally adds a date stamp. This is the final README that can be seen in the learning_bam_file repository.

The final task in this workflow was to use MkDocs to render the README into a static site that can be hosted using GitHub pages (or on your own web server or simply viewed on your local computer). MkDocs is incredibly easy to use and builds a very nice page! You can view the same documentation rendered using the Read The Docs theme at https://davetang.github.io/learning_bam_file/.

And there you have it, reproducible documentation for SAMtools!

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.