Using the GenometriCorr package

I was reading through the bedtools jaccard documentation when I saw the reference “Exploring Massive, Genome Scale Datasets with the GenometriCorr Package”. Firstly for those wondering what the Jaccard index is, it’s a simple metric that is defined as so: $$!J(A,B) = \frac{| A \cap B |}{| A \cup B |}$$ The numerator is the…

Continue Reading

Pearson vs. Spearman correlation

Correlation measures are commonly used to show how correlated two sets of datasets are. A commonly used measure is the Pearson correlation. To illustrate when not to use a Pearson correlation: If we remove the 2,000 value: Use a non-parametric correlation (e.g. Spearman’s rank) measure if your dataset has outliers. It would probably be best…

Continue Reading

DESeq vs. edgeR vs. baySeq

6th April 2012: For a more updated version of this post, please refer see this post. A very simple comparison Using the TagSeqExample.tab file from the DESeq package as the benchmark dataset. According to DESeq authors, T1a and T1b are similar, so I removed the second column in the file corresponding to T1a: Hierarchical clustering…

Continue Reading

Calculating Pearson correlation using Perl

My modification of code which is originally available here. Probably easier to understand the original code. I altered the code so that I could use an anonymous 2d array and with strictures, so that I could plug it into my own code. Comments are included in the code below to assist use.

Continue Reading

Hierarchical clustering with p-values

The code, which allowed me to use the Spearman’s rank correlation coefficient, was kindly provided to me by the developer of¬†pvclust. Firstly download the unofficial package or just source it from my DropBox account. Start up R and follow: Hierarchical clustering with p-values.

Continue Reading