Getting started with HISAT, StringTie, and Ballgown

A popular toolset used for analysing RNA-seq data is the tuxedo suite, which consists of TopHat and Cufflinks. The suite provided a start to finish pipeline that allowed users to map reads, assemble transcripts, and perform differential expression analyses. A newer "tuxedo suite" has been developed and is made up of three tools: HISAT, StringTie,...

Continue Reading

7th Anniversary

I reached a million views on 2017 September 27th. Near the start of September, I had wondered if I would reach a million before my 7th anniversary, which is today. I used the traffic to this site to predict when I would hit the mark. Not the best fit. Use only 2017 data to predict....

Continue Reading

Getting started with Monocle

Monocle is an R package developed for analysing single cell gene expression data. Specifically, the package provides functionality for clustering and classifying single cells, conducting differential expression analyses, and constructing and investigating inferred developmental trajectories. The toolkit provides various alternative approaches for each analysis, hence your workflow may differ from the approach I've taken in...

Continue Reading

Rand Index versus the Adjusted Rand Index

I wrote about the Rand Index (RI) and the Adjusted Rand Index (ARI) in the last two posts but how do we interpret the indices and how are they different? The RI is: where and are the number of times a pair of items was clustered concordantly in two different sets. I wrote some code...

Continue Reading

The Rand index

I've been looking for ways to compare clustering results and through my searching I came across something called the Rand index. In this short post, I explain how this index is calculated.

Continue Reading

Getting started with Arabidopsis thaliana genomics

I have started to work on Arabidopsis thaliana, as I mentioned in my last post. As noted in the Encyclopedia of life: Arabidopsis thaliana is the most widely used model organism in plant biology. Its small genome size, fully sequenced in the year 2000, chromosome number, fast growth cycle (from seed germination to set in...

Continue Reading