Stop BLAST from phoning home

Some time back I learned from Devon Ryan on the bird app (no link because I have stopped using said app) that BLAST phones home every time you used it, by default. I was never aware of this until I saw the post and I’m not really a fan of having this turned on by…

Continue Reading

Mapping full-length mRNA sequences

I have used BLAT to align full-length mRNA sequences a long time ago. Since BLAT has been out for over 20 years, I was wondering what modern day alignment tool I could use as a replacement. Minimap2 came to mind and in this post I use it to map some known transcript sequences to the…

Continue Reading

Omicron variants

In this post, I describe a simple workflow for identifying Omicron variants from some sequencing data shared by the Kwazulu-Natal Research Innovation and Sequencing Platform (KRISP) and the Centre for Epidemic Response and Innovation (CERI). To follow the workflow, you need to have Docker installed and an Internet connection. To get started, run a container…

Continue Reading

Storing FASTQ as unaligned CRAM

I was updating my BAM to CRAM post spurred on by a recent comment and then I wondered whether I could store my FASTQ files as unaligned CRAM files to save space. I thought it wouldn’t be possible because the reads are unaligned and therefore we can’t make use of a reference to save space…

Continue Reading

Updating my Docker documentation

In my last post I outlined an approach for automatically creating reproducible documentation using GitHub Actions. I have now updated my Docker documentation using the same approach but without using a container because then I’d have to run Docker inside a Docker container (which is actually possible using GitLab’s CI/CD platform but I haven’t tried…

Continue Reading

Creating reproducible documentation part 2

I had previously written about my workflow for creating reproducible documentation for SAMtools. The main idea was to generate the documentation via a R Markdown document that includes the documentation and the SAMtools commands to be executed on the command line (I do not know of another tool that can achieve this but if you…

Continue Reading

Add Docker to your toolkit

Docker has been around for 8 years and it has become a very popular platform for developing software. In the 2020 Stack Overflow Developer Survey, 39.2% of professional developers (total of 44,705) reported that they have done development work using Docker. Docker is only behind Linux and Windows! For the full list check out Stack…

Continue Reading

Creating reproducible documentation

When I was first learning about SAMtools, I kept my notes in a Wiki. I would type the SAMtools commands in the terminal and copy and paste the output into my Wiki. It was a tedious task but it was a useful resource that I would refer back to frequently. The latest version of my…

Continue Reading

SQL group by statement on the command line

The GROUP BY statement allows you to perform operations in a group wise manner. I first learned of the Useful FILe and stream Operations (filo) repository a long long time ago and keep coming back to it over and over again. The filo toolkit comes with three tools: groupBy, stats, and shuffle. The groupBy tool…

Continue Reading

Sequence analysis of SARS-CoV-2 part 3

This post is a continuation of a series of posts on the sequence analysis of SARS-CoV-2; see part 1 and part 2 if you haven’t already. Since my first post, I found out that you can blast sequences against a Betacoronavirus database on NCBI BLAST. The database, as of 2020/03/10, has a total of 7,844…

Continue Reading