Download from the SRA using ffq (and ffs)

ffq is a tool that can be used to fetch metadata from several data repositories. It can also be used to generate links to raw data hosted at the EBI (via FTP), AWS, GCP, and NCBI. I wrote a short Bash script called ffs that can generate the commands needed to download the raw data…

Continue Reading

Deciding which bioinformatics tool to use

I just finished reading "Using prototyping to choose a bioinformatics workflow management system", which I summarised on Mastodon as follows: Enjoyed reading "Using prototyping to choose a #bioinformatics workflow management system". Paper describes authors’ 10 day experience searching and implementing a workflow. Summary: Need to decide which tool to use? Shortlist a list of potentially…

Continue Reading

Check where a gene is expressed from the command line

The Pachter Lab have developed some very useful bioinformatics software. In this post, I use gget to quickly query ARCHS4 on the command line to see where a gene of interest is expressed. The gget tool has other functionality too including sequence alignment, enrichment analysis, and even protein structure prediction using AlphaFold. Check it out!…

Continue Reading

TIL that you can download SRA data from AWS

The Sequence Read Archive (SRA) is the largest publicly available repository of high throughput sequencing data. (Fun fact: it used to be called the Short Read Archive since most of the data was from short read sequencers.) The tool fastq-dump from the SRA Toolkit can be used to download SRA data. A while ago I…

Continue Reading

Stop BLAST from phoning home

Some time back I learned from Devon Ryan on the bird app (no link because I have stopped using said app) that BLAST phones home every time you used it, by default. I was never aware of this until I saw the post and I’m not really a fan of having this turned on by…

Continue Reading

Mapping full-length mRNA sequences

I have used BLAT to align full-length mRNA sequences a long time ago. Since BLAT has been out for over 20 years, I was wondering what modern day alignment tool I could use as a replacement. Minimap2 came to mind and in this post I use it to map some known transcript sequences to the…

Continue Reading

Omicron variants

In this post, I describe a simple workflow for identifying Omicron variants from some sequencing data shared by the Kwazulu-Natal Research Innovation and Sequencing Platform (KRISP) and the Centre for Epidemic Response and Innovation (CERI). To follow the workflow, you need to have Docker installed and an Internet connection. To get started, run a container…

Continue Reading

Storing FASTQ as unaligned CRAM

I was updating my BAM to CRAM post spurred on by a recent comment and then I wondered whether I could store my FASTQ files as unaligned CRAM files to save space. I thought it wouldn’t be possible because the reads are unaligned and therefore we can’t make use of a reference to save space…

Continue Reading

Creating reproducible documentation part 2

I had previously written about my workflow for creating reproducible documentation for SAMtools. The main idea was to generate the documentation via a R Markdown document that includes the documentation and the SAMtools commands to be executed on the command line (I do not know of another tool that can achieve this but if you…

Continue Reading

Creating reproducible documentation

When I was first learning about SAMtools, I kept my notes in a Wiki. I would type the SAMtools commands in the terminal and copy and paste the output into my Wiki. It was a tedious task but it was a useful resource that I would refer back to frequently. The latest version of my…

Continue Reading