Creating reproducible documentation part 2

I had previously written about my workflow for creating reproducible documentation for SAMtools. The main idea was to generate the documentation via a R Markdown document that includes the documentation and the SAMtools commands to be executed on the command line (I do not know of another tool that can achieve this but if you do, please let me know). Docker was used to ensure that the environment was kept constant so the same documentation would be generated each time and a simple Makefile was used to tie everything together. The latest version of the documentation can be viewed on the GitHub repo or via GitHub Pages as rendered by MkDocs.

Recently, I have been learning about continuous integration (CI) and continuous deployment (CD) and discovered GitHub Actions. For the uninitiated, GitHub Actions is GitHub's platform for CI/CD although they (GitHub) like to point out that it's much more than that. As I learned more about GitHub Actions, I pondered as to whether I could use that platform to automatically generate my documentation. And since I wrote this post, it is possible, and this post is about how we can build reproducible documentation automatically using GitHub Actions.

But first, some background information on GitHub Actions! This presentation provides a nice summary of GitHub Actions but essentially it's a platform for running commands as a workflow. Each workflow is implemented as a YAML file and every workflow consists of several different core concepts, which are shown below (and were adapted from the presentation):

Events
These are events that can trigger and start a workflow.
Jobs
A job is a collection of tasks that are executed on the same runner. Different jobs run in their own environment and can run in parallel to other jobs.
Steps
These are the individual tasks that run commands in a job and these can be an action or a shell command.
Actions
An action is a custom application for the GitHub Actions platform.
Runners
A runner is a GitHub Actions server, which you can host yourself on a localised server (if you require more resources). GitHub Hosted runners are based on Ubuntu Linux, Windows, and macOS.

Therefore we need to create a YAML file that contains a job with the steps/commands to build our documentation, which will be executed on a GitHub Hosted runner when an event is observed. Since I have organised the building step with a Makefile, there is only one command to run (make). However, we need to ensure that the environment used for building is constant, so the same documentation will be built each time (if everything is kept the same).

I was actually testing GitLab's platform for CI/CD, before I discovered GitHub Actions. I knew that it was possible to use Docker images with GitLab's CI/CD and so I was looking for a way to use Docker images with GitHub Actions. It's not absolutely necessary to use Docker but I find that it makes things easier because dependencies, e.g. R and the rmarkdown package, for building my documentation can be easily included into a Docker image. I found two ways to use Docker with GitHub Actions: one is to specify the Docker image in the uses keyword, which specifies what a step will use to run a command and the other way is to use the container keyword. Since I want all my steps to run inside my container, I chose the latter approach. With that figured out, I created the following workflow (but as I have already mentioned, this is all still very new to me so my implementation may not follow the best practices):

# name of workflow that will be displayed on the actions page
name: Create README.md

# execute workflow only when these files are modified
on:
  push:
    paths:
      - 'eg/**'
      - 'Makefile'
      - 'create_readme.sh'
      - 'learning_bam_file.Rmd'

# a list of the jobs that run as part of the workflow
jobs:
  make_markdown:
    runs-on: ubuntu-latest

    # the type of runner to run the given job
    container: davetang/r_build:4.1.2

    # a list of the steps that will run as part of the job
    steps:
      - run: echo "The job was automatically triggered by a ${{ github.event_name }} event."
      - run: echo "This job is now running on a ${{ runner.os }} server hosted by GitHub!"
      - run: echo "The name of your branch is ${{ github.ref }} and your repository is ${{ github.repository }}."
      - name: Check out repository code
        uses: actions/checkout@v2
      - run: echo "The ${{ github.repository }} repository has been cloned to the runner."
      - run: echo "The workflow is now ready to test your code on the runner."

      - run: make
        name: make

      - name: Commit report
        run: |
          git config --global user.name 'Dave Tang'
          git config --global user.email 'davetingpongtang@gmail.com'
          git add "README.md"
          git commit -m "Build README.md"
          git push origin master

      - name: Build MkDocs site
        run: |
          cd mkdocs && mkdocs build

      - name: Deploy MkDocs
        run: |
          git branch gh-pages
          git pull
          cd mkdocs && mkdocs gh-deploy

      - run: echo "This job's status is ${{ job.status }}."

My workflow is called "Create README.md", which will be the name displayed on the Actions page of my GitHub repository. This workflow will only run when there have been changes to example files (inside the eg/ folder), the Makefile, the create_readme.sh shell script, and the R Markdown document. The documentation is built using these files, therefore if we change any of these files, we should run the workflow to see if everything still builds successfully. This workflow only has one job called "make_markdown" and runs on a GitHub Hosted runner (virtual machine) that uses the latest version of Ubuntu. The resource allocation of the Ubuntu runner is a 2-core CPU with 7 GB of RAM memory and 14 GB of SSD disk space. You can get more resources using a macOS runner (3-core CPU, 14 GB of RAM memory, and 14 GB of SSD disk space) but there is a 10 minute multiplier meaning that using 1 macOS minute would consume 10 minutes of your credit. There is no multiplier for using Linux, so that's something to consider if you end up using GitHub Actions a lot.

As I previously mentioned, this workflow uses Docker (using an image I prepared for building the documentation) via the container keyword and each step will be run inside this container. There are several run commands and some are named; if there is no name, the step name will default to the text specified in the run command. There are also variables contained inside ${{ }} delimiters; these are called contexts and are a way to access information about workflow runs, runner environments, jobs, and steps. The contexts were taken from the official quickstart guide and are not necessary for running the workflow but they help with testing.

There are actually just four commands that need to be run:

  1. make
  2. Commit report (I should change the details of the git config steps to indicate that the steps were run automatically and not by me.)
  3. Build MkDocs site
  4. Deploy MkDocs

Note that if any previous step fails, the workflow stops at that step. If the Build MkDocs site step fails, then the Deploy MkDocs and final echo command will not be run. For more information on the workflow syntax refer to the documentation, which explains the syntax in great detail.

Summary

In this post and the prequel to this post, I described my workflow for generating reproducible documentation on the feature-rich tool SAMtools. This post details how GitHub Actions can be used to build the documentation automatically each time I make a change to the relevant files and push the changes to GitHub. For example, if a new version of SAMtools is released and I want to rebuild the documentation for the new version, I will update the Makefile with the new version, push the change to GitHub, and the documentation will be automatically rebuilt (unless the newer version is incompatible with some of my examples and the building step fails). This is actually the idea behind CI/CD, which is to automatically test your updated code to see if everything works and to deploy your changes only when everything works.

So there you have it: building reproducible documentation automatically using R Markdown, Docker, and GitHub Actions!

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.