Check where a gene is expressed from the command line

The Pachter Lab have developed some very useful bioinformatics software. In this post, I use gget to quickly query ARCHS4 on the command line to see where a gene of interest is expressed. The gget tool has other functionality too including sequence alignment, enrichment analysis, and even protein structure prediction using AlphaFold. Check it out!…

Continue Reading

Omicron variants

In this post, I describe a simple workflow for identifying Omicron variants from some sequencing data shared by the Kwazulu-Natal Research Innovation and Sequencing Platform (KRISP) and the Centre for Epidemic Response and Innovation (CERI). To follow the workflow, you need to have Docker installed and an Internet connection. To get started, run a container…

Continue Reading

Annotating variants with a custom file

The Variant Effect Predictor (VEP) tool can be used for annotating variants with respect to custom annotation sources. This is useful if gene models of interest are not represented in the Ensembl or RefSeq databases. To get started, first install VEP since it takes some time.

Continue Reading

Getting started with GEMINI

After getting started and getting acquainted with DNA sequencing data, it’s finally time to explore DNA variation. A tool that makes this easy is GEMINI and this post briefly demonstrates some of its functionality. I have only used GEMINI sparingly and what I know about the tool is gathered mostly from their documentation and tutorials….

Continue Reading

OMIM IDs to gene coordinates

A post on linking OMIM IDs to gene coordinates using biomaRt; this provides a way of representing OMIM IDs on the genome. For those unfamiliar with OMIM, here’s the description from the OMIM FAQ: Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits, with particular…

Continue Reading

Finding composite repetitive elements

Transposons have the ability to “jump” around in genomes and sometimes transposons jump into genomic sites occupied by other repetitive elements; these cases are what I refer to as “composite repetitive elements” for the purpose of this post. While almost all DNA transposons and the majority of retrotransposons have lost the ability to move around…

Continue Reading

How do I fetch lincRNAs from Ensembl?

Here’s a very short post on how to fetch lincRNAs from Ensembl using R and the biomaRt package. For those who are not familiar with biomaRt, you can check out my older post on biomaRt. Firstly, start R and install the biomaRt package from Bioconductor by copying and pasting the code below:

Continue Reading

Repetitive elements in vertebrate genomes

Updated 2015 February 8th to include some scatter plots of genome size versus repeat content. I was writing about the make up of genomes today and was looking up statistics on repetitive elements in vertebrate genomes. While I could find individual papers with repetitive element statistics for a particular genome, I was unable to find…

Continue Reading

Genomic Regions Enrichment of Annotations Tool

The Genomic Regions Enrichment of Annotations Tool (GREAT) is a tool that allows you to find enriched ontological terms in a set of genomic regions. This talk (running time ~1 hour) gives an overview of the tool. In brief, GREAT is an alternative to gene-centric enrichment tools such as DAVID and uses a binomial test…

Continue Reading

Bioconductor annotation packages

The Bioconductor annotation packages are an extensive collection of annotations. For this post I simply illustrate the basics of probing these annotation packages. For the first example I will use the org.Hs.eg.db package, which provides genome wide annotations for the human genome. We can query the package by using the select() function; to find out…

Continue Reading