## The Golden Rule of Bioinformatics

I’m a big fan of the book Bioinformatics Data Skills by Vince Buffalo and I highly recommend it to everyone who works in the bioinformatics field. The book introduces the reader to The Golden Rule of Bioinformatics, which is: Never ever trust your tools (or data). I am a strong proponent of this rule, which…

## Rand Index versus the Adjusted Rand Index

I wrote about the Rand Index (RI) and the Adjusted Rand Index (ARI) in the last two posts but how do we interpret the indices and how are they different? The RI is: where $$a$$ and $$b$$ are the number of times a pair of items was clustered concordantly in two different sets. I wrote…

## The Adjusted Rand index

In my last post, I wrote about the Rand index. This post will be on the Adjusted Rand index (ARI), which is the corrected-for-chance version of the Rand index:

## The Rand index

I’ve been looking for ways to compare clustering results and through my searching I came across something called the Rand index. In this short post, I explain how this index is calculated.

## Markov clustering

The Markov Cluster (MCL) Algorithm is an unsupervised cluster algorithm for graphs based on simulation of stochastic flow in graphs. Markov clustering was the work of Stijn van Dongen and you can read his thesis on the Markov Cluster Algorithm. The work is based on the graph clustering paradigm, which postulates that natural groups in…

## Visualising hierarchical clustering results

I’ve written about hierarchical clustering before as an attempt to understand it better. Within R, you can plot the hierarchical clustering results however when working with a large dataset you may produce plots like these where all the labels are overlapping: and As you can see you can’t see any of the labels. During my…

## Phylogenetic profiling

On my wiki I have a short summary of phylogenetic profiling. The program MrBayes is used for Bayesian inference for phylogeny and can be used for inferring relationships using binary type data such as phylogenetic profiles. The input to MrBayes is a NEXUS file and here is the example I will use: #NEXUS begin data;…

## PCA and rggobi

I labelled only two samples since the text would overlap. Samples 1 to 20 are located near the 16 and samples 21 to 40 near the 37, as expected.

## Finding genes with co-expression patterns

Can the R bioconductor package “WGCNA” find artefactually created modules? Firstly some (subpar) code to generate an artefactual list of genes with co-expression patterns (modules): Running the code: ./generate_random_module.pl 10 1000 20 > 10_sample_1000_list_20_module.tsv Patterns: 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1…