Creating reproducible documentation part 2

I had previously written about my workflow for creating reproducible documentation for SAMtools. The main idea was to generate the documentation via a R Markdown document that includes the documentation and the SAMtools commands to be executed on the command line (I do not know of another tool that can achieve this but if you…

Continue Reading

Grepping a list with a list

The grep command-line utility is a commonly used tool for searching plain text files for lines that match a pattern. For example, you could search a gene or SNP ID in a BED/GFF/GTF file to find out its coordinates. In this post, I will demonstrate how you can search for a list of things in…

Continue Reading

On blogging

When I was an honours student (in Australia an honours degree is a one year [in practice, it’s about ten months] undergraduate research program you can take after your Bachelor’s and has nothing to do with the traditional meaning of the word “honours”), I literally jumped into the deep end of bioinformatics. I recall joining…

Continue Reading

On learning

Sometime in the past this blog was called “Musings from a PhD candidate”, despite hardly writing any blog posts that were of the contemplative sort. It later evolved to “Musings from an unlikely candidate” because I had received my PhD, which I thought was quite an unlikely event given my background. However, this carried a…

Continue Reading

Add Docker to your toolkit

Docker has been around for 8 years and it has become a very popular platform for developing software. In the 2020 Stack Overflow Developer Survey, 39.2% of professional developers (total of 44,705) reported that they have done development work using Docker. Docker is only behind Linux and Windows! For the full list check out Stack…

Continue Reading

Creating reproducible documentation

When I was first learning about SAMtools, I kept my notes in a Wiki. I would type the SAMtools commands in the terminal and copy and paste the output into my Wiki. It was a tedious task but it was a useful resource that I would refer back to frequently. The latest version of my…

Continue Reading

Running RStudio Server with Docker

I highly recommend using RStudio if you use R because it makes working with R so much easier. I primarily use RStudio for writing up my analyses in R Markdown. Some RStudio features I couldn’t live without include: Vim keybindings, code completion, and code highlighting (rainbow parentheses are awesome!). Other nice features I like to…

Continue Reading

SQL group by statement on the command line

The GROUP BY statement allows you to perform operations in a group wise manner. I first learned of the Useful FILe and stream Operations (filo) repository a long long time ago and keep coming back to it over and over again. The filo toolkit comes with three tools: groupBy, stats, and shuffle. The groupBy tool…

Continue Reading

Monte Carlo integration

I recently came across this question: “You have a function called random that randomly generates a number between 0 to 1. Use this to calculate pi.” Worded differently, the question is asking you to estimate pi using random numbers. As you can read in the script for Life of Pi (one of my favourite movies):…

Continue Reading

Ten years

As of today, it has been a decade since my first post on this blog. It started aimlessly during my PhD as a place to post analysis notes for myself and ten years later it still remains so. However, over the years I have started to focus more on better project management practices and on…

Continue Reading