Twelfth year: Python

As this blog enters its twelfth year, I am finally using Python instead of Perl as my scripting language of choice (which is contrary to what I said two years ago!). As I wrote in my learning Python repo, my interest in deep learning finally tipped me over since several popular deep learning frameworks (TensorFlow,…

Continue Reading

Backticks in R

The only times I used backticks in R was when a file was imported into R and the column names had spaces (as a side note, please don’t use spaces in your column or file names but use an underscore, i.e. _). For example, you can access col a by using backticks. my_df <- data.frame(…

Continue Reading

Wrapping R vectors with parentheses

In this vectors and lists tutorial, the example code wraps the R vector assignments with parentheses or round brackets. (v_log <- c(TRUE, FALSE, FALSE, TRUE)) #> [1] TRUE FALSE FALSE TRUE (v_int <- 1:4) #> [1] 1 2 3 4 (v_doub <- 1:4 * 1.2) #> [1] 1.2 2.4 3.6 4.8 (v_char <- letters[1:4]) #>…

Continue Reading

Split single column of key-value pairs into multiple columns

Two widely used file formats in bioinformatics, VCF and GTF, have single columns that are packed with annotation information. This makes them a bit inconvenient to work with in R when using data frames because the values need to be unpacked, i.e. split. In addition, this violates one of the conditions for tidy data, which…

Continue Reading

Map, join, and pivot in R

In this post, I will describe a series of data processing steps in R that I often perform that involves the map_df, inner_join, and pivot_longer functions from the purrr, dplyr, and tidyr packages, respectively. They are all part of the tidyverse, so to follow this post, please install the tidyverse package. install.packages("tidyverse") library(tidyverse) The typical…

Continue Reading

Mapping full-length mRNA sequences

I have used BLAT to align full-length mRNA sequences a long time ago. Since BLAT has been out for over 20 years, I was wondering what modern day alignment tool I could use as a replacement. Minimap2 came to mind and in this post I use it to map some known transcript sequences to the…

Continue Reading

Finding out weather conditions from the command line

In this post, I outline an approach for retrieving weather conditions from the command line. There are websites and widgets that provide weather details but I like using the command line because I find that it’s more efficient than pointing and clicking on stuff. In addition, this approach enables us to program specific tasks. For…

Continue Reading

Omicron variants

In this post, I describe a simple workflow for identifying Omicron variants from some sequencing data shared by the Kwazulu-Natal Research Innovation and Sequencing Platform (KRISP) and the Centre for Epidemic Response and Innovation (CERI). To follow the workflow, you need to have Docker installed and an Internet connection. To get started, run a container…

Continue Reading

Storing FASTQ as unaligned CRAM

I was updating my BAM to CRAM post spurred on by a recent comment and then I wondered whether I could store my FASTQ files as unaligned CRAM files to save space. I thought it wouldn’t be possible because the reads are unaligned and therefore we can’t make use of a reference to save space…

Continue Reading

Updating my Docker documentation

In my last post I outlined an approach for automatically creating reproducible documentation using GitHub Actions. I have now updated my Docker documentation using the same approach but without using a container because then I’d have to run Docker inside a Docker container (which is actually possible using GitLab’s CI/CD platform but I haven’t tried…

Continue Reading