Hierarchical clustering with p-values

The code, which allowed me to use the Spearman's rank correlation coefficient, was kindly provided to me by the developer of pvclust.

Firstly download the unofficial package or just source it from my DropBox account. Start up R and follow:

#load the package
#if you are having trouble sourcing from https run
#install.packages('devtools')
#library(devtools)
#source_url("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust.R")
#source_url("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust-internal.R")

source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust.R")
source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust-internal.R")

#use a test dataset from DESeq
#install DESeq if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("DESeq")
#load DESeq
library("DESeq")
example_file <- system.file ("extra/TagSeqExample.tab", package="DESeq")
data <- read.delim(example_file, header=T, row.names="gene")
head(data)
           T1a T1b  T2  T3  N1  N2
Gene_00001   0   0   2   0   0   1
Gene_00002  20   8  12   5  19  26
Gene_00003   3   0   2   0   0   0
Gene_00004  75  84 241 149 271 257
Gene_00005  10  16   4   0   4  10
Gene_00006 129 126 451 223 243 149

# Define a distance function

spearman <- function(x, ...) {
    x <- as.matrix(x)
    res <- as.dist(1 - cor(x, method = "spearman", use = "everything"))
    res <- as.dist(res)
    attr(res, "method") <- "spearman"
    return(res)
}

result <- pvclust(data, method.dist=spearman, nboot=100)

result

Cluster method: average
Distance      : spearman

Estimates on edges:

     au    bp se.au se.bp      v      c pchi
1 1.000 1.000 0.000  0.00  0.000  0.000    0
2 1.000 1.000 0.000  0.00  0.000  0.000    0
3 1.000 1.000 0.000  0.00  0.000  0.000    0
4 0.731 0.992 1.071  0.02 -1.507 -0.892    1
5 1.000 1.000 0.000  0.00  0.000  0.000    0

#pvclust classed object
class(result)
[1] "pvclust"

names(result)
[1] "hclust" "edges"  "count"  "msfit"  "nboot"  "r"      "store"

plot(result)
pvrect(result, alpha=0.95)

pvclust_exampleHierarchical clustering with p-values.

15 thoughts on “Hierarchical clustering with p-values

    • Hi Alex,

      The file was a tab delimited file, where the rows are the genes and the columns the different samples.

      I've updated this post to use a dataset that is available from the DESeq package. You can try the example I've provided to see if it's useful for you.

      Hope that helps,

      Dave

  1. Dear Davo,

    many thanks for the code. I searched a lot for some examples about how to use pvclust with alternative distance measures. It seems many people are also having the same question, and it seems your code just hit the spot !

    All the best.
    Tiago

    • Hi Tiago,

      Yeah the code was kindly written for me when I emailed the author of pvclust, so all credits to him.

      But glad you could find it and that it was helpful.

      Cheers,

      Dave

  2. Hi, I wanted to use this function but when I run this code I got an error.
    Erreur in as.character(x) :
    cannot coerce type 'closure' to vector of type 'character'
    Thank you

    • Hi Patricia,

      are you following the example in the post, or using your own data? It's a bit hard to troubleshoot just from the error.

      Cheers,

      Dave

      • Hi Dave,

        So I am using my own data,

        when I run the commands to create the function I don't get an error but just after when I run the pvclust code.

        Best regards.

        Patricia

        • Hi Patricia,

          without knowing the exact commands you ran and without a copy of your data, it's quite hard to know what is wrong.

          Cheers,

          Dave

  3. Hi,
    when I run the code above I get the same error:
    Error in as.character(x) :
    cannot coerce type 'closure' to vector of type 'character'

    This happens with the example data as well as with my own data.

    R version 3.0.2 (2013-09-25)
    pvclust_1.2-2

    • I found the reason - or at least I now can run the example:
      I also downloaded the 2 R scripts from the dropbox urls and sourced both files. Using this functions I was able tun run the examples.
      It seems that there are differences in the code from tha package I used (pvclust_1.2-2) and this code from dropbox.
      Strange

      • Yes the code from DropBox is the unofficial code (from the same author of pvclust); use that code if you want Spearman's correlation instead of Pearson.

Leave a Reply

Your email address will not be published. Required fields are marked *


× 2 = eight

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>