Hierarchical clustering with p-values

The code, which allowed me to use the Spearman's rank correlation coefficient, was kindly provided to me by the developer of pvclust.

Firstly download the unofficial package or just source it from my DropBox account. Start up R and follow:

#load the package
source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust.R")
source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust-internal.R")

#use a test dataset from DESeq
#install DESeq if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("DESeq")
#load DESeq
library("DESeq")
example_file <- system.file ("extra/TagSeqExample.tab", package="DESeq")
data <- read.delim(example_file, header=T, row.names="gene")
head(data)
           T1a T1b  T2  T3  N1  N2
Gene_00001   0   0   2   0   0   1
Gene_00002  20   8  12   5  19  26
Gene_00003   3   0   2   0   0   0
Gene_00004  75  84 241 149 271 257
Gene_00005  10  16   4   0   4  10
Gene_00006 129 126 451 223 243 149

# Define a distance function

spearman <- function(x, ...) {
    x <- as.matrix(x)
    res <- as.dist(1 - cor(x, method = "spearman", use = "everything"))
    res <- as.dist(res)
    attr(res, "method") <- "spearman"
    return(res)
}

result <- pvclust(data, method.dist=spearman, nboot=100)

result

Cluster method: average
Distance      : spearman

Estimates on edges:

     au    bp se.au se.bp      v      c pchi
1 1.000 1.000 0.000  0.00  0.000  0.000    0
2 1.000 1.000 0.000  0.00  0.000  0.000    0
3 1.000 1.000 0.000  0.00  0.000  0.000    0
4 0.731 0.992 1.071  0.02 -1.507 -0.892    1
5 1.000 1.000 0.000  0.00  0.000  0.000    0

#pvclust classed object
class(result)
[1] "pvclust"

names(result)
[1] "hclust" "edges"  "count"  "msfit"  "nboot"  "r"      "store"

plot(result)
pvrect(result, alpha=0.95)

pvclust_exampleHierarchical clustering with p-values.

12 thoughts on “Hierarchical clustering with p-values

    • Hi Alex,

      The file was a tab delimited file, where the rows are the genes and the columns the different samples.

      I've updated this post to use a dataset that is available from the DESeq package. You can try the example I've provided to see if it's useful for you.

      Hope that helps,

      Dave

  1. Dear Davo,

    many thanks for the code. I searched a lot for some examples about how to use pvclust with alternative distance measures. It seems many people are also having the same question, and it seems your code just hit the spot !

    All the best.
    Tiago

    • Hi Tiago,

      Yeah the code was kindly written for me when I emailed the author of pvclust, so all credits to him.

      But glad you could find it and that it was helpful.

      Cheers,

      Dave

  2. Hi, I wanted to use this function but when I run this code I got an error.
    Erreur in as.character(x) :
    cannot coerce type 'closure' to vector of type 'character'
    Thank you

    • Hi Patricia,

      are you following the example in the post, or using your own data? It's a bit hard to troubleshoot just from the error.

      Cheers,

      Dave

      • Hi Dave,

        So I am using my own data,

        when I run the commands to create the function I don't get an error but just after when I run the pvclust code.

        Best regards.

        Patricia

        • Hi Patricia,

          without knowing the exact commands you ran and without a copy of your data, it's quite hard to know what is wrong.

          Cheers,

          Dave

Leave a Reply