Running RStudio Server with Docker

I highly recommend using RStudio if you use R because it makes working with R so much easier. I primarily use RStudio for writing up my analyses in R Markdown. Some RStudio features I couldn’t live without include: Vim keybindings, code completion, and code highlighting (rainbow parentheses are awesome!). Other nice features I like to…

Continue Reading

SQL group by statement on the command line

The GROUP BY statement allows you to perform operations in a group wise manner. I first learned of the Useful FILe and stream Operations (filo) repository a long long time ago and keep coming back to it over and over again. The filo toolkit comes with three tools: groupBy, stats, and shuffle. The groupBy tool…

Continue Reading

Monte Carlo integration

I recently came across this question: “You have a function called random that randomly generates a number between 0 to 1. Use this to calculate pi.” Worded differently, the question is asking you to estimate pi using random numbers. As you can read in the script for Life of Pi (one of my favourite movies):…

Continue Reading

Ten years

As of today, it has been a decade since my first post on this blog. It started aimlessly during my PhD as a place to post analysis notes for myself and ten years later it still remains so. However, over the years I have started to focus more on better project management practices and on…

Continue Reading

Sequence analysis of SARS-CoV-2 part 3

This post is a continuation of a series of posts on the sequence analysis of SARS-CoV-2; see part 1 and part 2 if you haven’t already. Since my first post, I found out that you can blast sequences against a Betacoronavirus database on NCBI BLAST. The database, as of 2020/03/10, has a total of 7,844…

Continue Reading

Sequence analysis of SARS-CoV-2 part 2

I ended my previous post on the sequence analysis of SARS-CoV-2 with the amino acid alignment of the spike protein from SARS-CoV-2 (MN908947) and Bat coronavirus RaTG13 (MN996532). The spike protein is of specific interest as it is due to its binding with the angiotensin converting enzyme 2 (ACE2) receptor that it is able to…

Continue Reading

Sequence analysis of SARS-CoV-2

The article “A new coronavirus associated with human respiratory disease in China” released the full genome sequence (29,903 nt) of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In the paper, meta-transcriptomic sequencing was performed on bronchoalveolar lavage fluid (BALF) from a 41 year old male suffering from a severe respiratory disease. Contigs were assembled using…

Continue Reading

Domain renewal

Some of you may have noticed the Buy me a coffee link I have on my blog. I have just used all the contributions to help pay for the hosting fees; this blog should be here for another two years (extended up until 2022 April 24). Thank you to everyone who has bought me a…

Continue Reading

Learning WDL

An approach I like to use when learning a new tool is to get started by trying to run an example and then gradually work out the details. In this post, I’m trying to learn the basics of the Workflow Description Language (WDL) so that I can adapt GATK workflows for my own use. WDL,…

Continue Reading