Getting started with Monocle

Monocle is an R package developed for analysing single cell gene expression data. Specifically, the package provides functionality for clustering and classifying single cells, conducting differential expression analyses, and constructing and investigating inferred developmental trajectories. The toolkit provides various alternative approaches for each analysis, hence your workflow may differ from the approach I’ve taken in…

Continue Reading

Rand Index versus the Adjusted Rand Index

I wrote about the Rand Index (RI) and the Adjusted Rand Index (ARI) in the last two posts but how do we interpret the indices and how are they different? The RI is: where $$a$$ and $$b$$ are the number of times a pair of items was clustered concordantly in two different sets. I wrote…

Continue Reading

The Rand index

I’ve been looking for ways to compare clustering results and through my searching I came across something called the Rand index. In this short post, I explain how this index is calculated.

Continue Reading

Getting started with Arabidopsis thaliana genomics

I have started to work on Arabidopsis thaliana, as I mentioned in my last post. As noted in the Encyclopedia of life: Arabidopsis thaliana is the most widely used model organism in plant biology. Its small genome size, fully sequenced in the year 2000, chromosome number, fast growth cycle (from seed germination to set in…

Continue Reading

Read GTF file into R

The Gene Transfer Format (GTF) is a refinement of the General Feature Format (GFF). A GFF file has nine columns: seqname The name of the sequence; must be a chromosome or scaffold. source The program that generated this feature. feature The name of this type of feature, e.g. “CDS”, “start_codon”, “stop_codon”, and “exon” start The…

Continue Reading

Getting started with Seurat

Updated 2018 April 11th This post follows the Peripheral Blood Mononuclear Cells (PBMCs) tutorial for 2,700 single cells. It was written while I was going through the tutorial and contains my notes. The dataset for this tutorial can be downloaded from the 10X Genomics dataset page but it is also hosted on Amazon (see below)….

Continue Reading

Learning about Snakemake

Updated 2018 May 29th to include example using a config file As promised two years ago, here’s a short blog post on Snakemake. I have been using Bpipe to manage my workflows/pipelines but Snakemake has been mentioned to me on more than one occasion; in particular: @davetang31 seems useful if testing many diff types of…

Continue Reading

Incidental findings using GEMINI

The American College of Medical Genetics and Genomics (ACMG) have recommended that genetic variants that may be pathogenic or likely pathogenic in certain genes should be reported back to the patient. The latest list of genes can be found here. How do I assess whether a variant is pathogenic or likely pathogenic? Use this tool,…

Continue Reading