An example differential gene expression results table

This post contains the analysis steps used to create a differential gene expression results table generated from RNA-seq counts summarised using nf-core/rnaseq. The comparison was done between two conditions: normal versus (lung) cancer. We will be using {edgeR}, so install it if you haven’t already. if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("edgeR") We will also…

Continue Reading

An example RNA-seq count table

I have been using pnas_expression.txt as a test dataset for count table analyses for many years. It was created by Davis McCarthy and was hosted on their Google Sites website. After some time, the site became unavailable and I have been hosting it on my web server since then. The RNA-seq libraries were generated using…

Continue Reading

Check what genes are correlated to your gene of interest

ARCHS4 (All RNA-seq and ChIP-seq sample and signature search) is a resource that provides access to gene and transcript counts uniformly processed (using kallisto) from all human and mouse RNA-seq experiments from the Gene Expression Omnibus (GEO) and the Sequence Read Archive (SRA). The tool gget and the sub-tool archs4 can be used to query…

Continue Reading

Download from the SRA using ffq (and ffs)

ffq is a tool that can be used to fetch metadata from several data repositories. It can also be used to generate links to raw data hosted at the EBI (via FTP), AWS, GCP, and NCBI. I wrote a short Bash script called ffs that can generate the commands needed to download the raw data…

Continue Reading

Deciding which bioinformatics tool to use

I just finished reading "Using prototyping to choose a bioinformatics workflow management system", which I summarised on Mastodon as follows: Enjoyed reading "Using prototyping to choose a #bioinformatics workflow management system". Paper describes authors’ 10 day experience searching and implementing a workflow. Summary: Need to decide which tool to use? Shortlist a list of potentially…

Continue Reading

Check where a gene is expressed from the command line

The Pachter Lab have developed some very useful bioinformatics software. In this post, I use gget to quickly query ARCHS4 on the command line to see where a gene of interest is expressed. The gget tool has other functionality too including sequence alignment, enrichment analysis, and even protein structure prediction using AlphaFold. Check it out!…

Continue Reading

TIL that you can download SRA data from AWS

The Sequence Read Archive (SRA) is the largest publicly available repository of high throughput sequencing data. (Fun fact: it used to be called the Short Read Archive since most of the data was from short read sequencers.) The tool fastq-dump from the SRA Toolkit can be used to download SRA data. A while ago I…

Continue Reading

Stop BLAST from phoning home

Some time back I learned from Devon Ryan on the bird app (no link because I have stopped using said app) that BLAST phones home every time you used it, by default. I was never aware of this until I saw the post and I’m not really a fan of having this turned on by…

Continue Reading

Mapping full-length mRNA sequences

I have used BLAT to align full-length mRNA sequences a long time ago. Since BLAT has been out for over 20 years, I was wondering what modern day alignment tool I could use as a replacement. Minimap2 came to mind and in this post I use it to map some known transcript sequences to the…

Continue Reading