Organising computational biology projects with Cookiecutter

Last updated: 2024/04/26

A quick guide to organizing computational biology projects has been published over 8 years ago but the main messages are still very relevant (perhaps even more so nowadays given the exponential increase in biological data). In a nutshell, computational biology projects need to be organised so that we can share and reproduce our work in an efficient manner. The guide provides an example of how a project can be organised.

Figure 1. Directory structure for a sample project from A quick guide to organizing computational biology projects.

The suggested structure may not suit everyone but the point of this post is to illustrate a command-line utility called Cookiecutter that can be used to template a project as per the example in the guide. This will ensure that each of your projects follows a well defined structure. My example below was adapted from an example that no longer exists. For other tutorials, please see the official documentation.

To get started, install Cookiecutter using pip.

pip install cookiecutter

We will now create a new directory for our template and move into the new directory.

mkdir new_project && cd new_project

Now we will create the same directory structure as Figure 1 but with templating tags. These tags will be used in the cookiecutter.json file.

mkdir {{cookiecutter.project_name}} && cd {{cookiecutter.project_name}}

mkdir {{cookiecutter.documentation}} && printf "# README\n\nDocumentation goes here\n" > {{cookiecutter.documentation}}/README.md
mkdir {{cookiecutter.data}} && printf "# README\n\nData goes here\n" > {{cookiecutter.data}}/README.md
mkdir {{cookiecutter.source}} && printf "# README\n\nSource code goes here\n" > {{cookiecutter.source}}/README.md
mkdir {{cookiecutter.bin}} && printf "# README\n\nBinaries goes here\n" > {{cookiecutter.bin}}/README.md
mkdir {{cookiecutter.results}} && printf "# README\n\nProcessed results goes here\n" > {{cookiecutter.results}}/README.md

I usually have a README file inside each project directory that provides a general overview of the project. We will use a nice README template that we can download using wget and then replace the Project Title with a templating tag.

url=https://gist.githubusercontent.com/PurpleBooth/109311bb0361f32d87a2/raw/824da51d0763e6855c338cc8107b2ff890e7dd43/README-Template.md

wget -O - ${url} \
   | sed 's/Project Title/{{cookiecutter.project_name}}/' \
   > {{cookiecutter.README}}.md

Finally, we need to create the cookiecutter.json file, which will reside in the new_project directory.

cat <<EOF > cookiecutter.json
{
   "project_name": "new_project",
   "documentation": "doc",
   "data": "data",
   "source": "src",
   "bin": "bin",
   "results": "results",
   "README": "README"
}
EOF

If you followed the steps above, you should have this directory structure inside the new_project directory.

tree --charset unicode new_project/
new_project/
|-- cookiecutter.json
`-- {{cookiecutter.project_name}}
    |-- {{cookiecutter.bin}}
    |   `-- README.md
    |-- {{cookiecutter.data}}
    |   `-- README.md
    |-- {{cookiecutter.documentation}}
    |   `-- README.md
    |-- {{cookiecutter.README}}.md
    |-- {{cookiecutter.results}}
    |   `-- README.md
    `-- {{cookiecutter.source}}
        `-- README.md

7 directories, 7 files

To use our newly created template, move into the directory where you want to create the new project. I will use the same directory as where I created the new_project directory. I will also use the default values except for the project_name when creating the project.

cookiecutter new_project/
project_name [new_project]: msms
documentation [doc]: 
data [data]: 
source [src]: 
bin [bin]: 
results [results]: 
README [README]: 
tree --charset unicode msms/
msms/
|-- bin
|   `-- README.md
|-- data
|   `-- README.md
|-- doc
|   `-- README.md
|-- README.md
|-- results
|   `-- README.md
`-- src
    `-- README.md

6 directories, 6 files
head -1 msms/README.md
# msms

Summary

As stated in the basic tutorial (that no longer exists):

Cookiecutter takes a source directory tree and copies it into your new project. It replaces all the names that it finds surrounded by templating tags {{ and }} with names that it finds in the file cookiecutter.json. That’s basically it.

There are a lot of features provided by Cookiecutter and a lot of project templates that you can use and adapt to your liking.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
2 comments Add yours
  1. Thanks for a great blog with a lot of great content such as cookiecutter which I have begun to implement in my projects.
    It seems like you have a comma to much in your final line in you json file.

    Cheers

    1. Thanks Simon! In the file on my computer, I actually didn’t have the comma. Don’t know how it snuck in 🙂 I’ve updated the post.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.