The Rand index

I've been looking for ways to compare clustering results and through my searching I came across something called the Rand index. In this short post, I explain how this index is calculated.

Continue Reading

Markov clustering

The Markov Cluster (MCL) Algorithm is an unsupervised cluster algorithm for graphs based on simulation of stochastic flow in graphs. Markov clustering was the work of Stijn van Dongen and you can read his thesis on the Markov Cluster Algorithm. The work is based on the graph clustering paradigm, which postulates that natural groups in...

Continue Reading

Visualising hierarchical clustering results

I've written about hierarchical clustering before as an attempt to understand it better. Within R, you can plot the hierarchical clustering results however when working with a large dataset you may produce plots like these where all the labels are overlapping: and As you can see you can't see any of the labels. During my...

Continue Reading

Phylogenetic profiling

On my wiki I have a short summary of phylogenetic profiling. The program MrBayes is used for Bayesian inference for phylogeny and can be used for inferring relationships using binary type data such as phylogenetic profiles. The input to MrBayes is a NEXUS file and here is the example I will use: #NEXUS begin data;...

Continue Reading

Clustering mapped reads

Updated 2014 October 8th to include an analysis using CAGE data from ENCODE and rewrote parts of the post. In this post I will write about a read clustering method called paraclu, which allows mapped reads to be clustered together. This is particularly useful when working with CAGE data, where transcription start sites (TSSs) are...

Continue Reading