Using the GenometriCorr package

I was reading through the bedtools jaccard documentation when I saw the reference “Exploring Massive, Genome Scale Datasets with the GenometriCorr Package”. Firstly for those wondering what the Jaccard index is, it’s a simple metric that is defined as so: $$!J(A,B) = \frac{| A \cap B |}{| A \cup B |}$$ The numerator is the…

Continue Reading

Pearson vs. Spearman correlation

Correlation measures are commonly used to show how correlated two sets of datasets are. A commonly used measure is the Pearson correlation. To illustrate when not to use a Pearson correlation: If we remove the 2,000 value: Use a non-parametric correlation (e.g. Spearman’s rank) measure if your dataset has outliers. It would probably be best…

Continue Reading

DESeq vs. edgeR vs. baySeq

6th April 2012: For a more updated version of this post, please refer see this post. A very simple comparison Using the TagSeqExample.tab file from the DESeq package as the benchmark dataset. According to DESeq authors, T1a and T1b are similar, so I removed the second column in the file corresponding to T1a: Hierarchical clustering…

Continue Reading

Calculating Pearson correlation using Perl

My modification of code which is originally available here. Probably easier to understand the original code. I altered the code so that I could use an anonymous 2d array and with strictures, so that I could plug it into my own code. Comments are included in the code below to assist use.

Continue Reading

Hierarchical clustering with p-values

The code, which allowed me to use the Spearman’s rank correlation coefficient, was kindly provided to me by the developer of pvclust. Firstly download the unofficial package or just source it from my DropBox account. Start up R and follow: Hierarchical clustering with p-values.

Continue Reading