Hierarchical clustering with p-values

The code, which allowed me to use the Spearman's rank correlation coefficient, was kindly provided to me by the developer of pvclust.

Firstly download the unofficial package or just source it from my DropBox account. Start up R and follow:

#load the package
#if you are having trouble sourcing from https run
#install.packages('devtools')
#library(devtools)
#source_url("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust.R")
#source_url("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust-internal.R")

source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust.R")
source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust-internal.R")

#use a test dataset from DESeq
#install DESeq if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("DESeq")
#load DESeq
library("DESeq")
example_file <- system.file ("extra/TagSeqExample.tab", package="DESeq")
data <- read.delim(example_file, header=T, row.names="gene")
head(data)
           T1a T1b  T2  T3  N1  N2
Gene_00001   0   0   2   0   0   1
Gene_00002  20   8  12   5  19  26
Gene_00003   3   0   2   0   0   0
Gene_00004  75  84 241 149 271 257
Gene_00005  10  16   4   0   4  10
Gene_00006 129 126 451 223 243 149

# Define a distance function

spearman <- function(x, ...) {
    x <- as.matrix(x)
    res <- as.dist(1 - cor(x, method = "spearman", use = "everything"))
    res <- as.dist(res)
    attr(res, "method") <- "spearman"
    return(res)
}

result <- pvclust(data, method.dist=spearman, nboot=100)

result

Cluster method: average
Distance      : spearman

Estimates on edges:

  au bp se.au se.bp v c pchi
1  1  1     0     0 0 0    0
2  1  1     0     0 0 0    0
3  1  1     0     0 0 0    0
4  1  1     0     0 0 0    0
5  1  1     0     0 0 0    0

#pvclust classed object
class(result)
[1] "pvclust"

names(result)
[1] "hclust" "edges"  "count"  "msfit"  "nboot"  "r"      "store"

plot(result)
pvrect(result, alpha=0.95)

pvclust_exampleHierarchical clustering with p-values.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
20 comments Add yours
    1. Hi Alex,

      The file was a tab delimited file, where the rows are the genes and the columns the different samples.

      I’ve updated this post to use a dataset that is available from the DESeq package. You can try the example I’ve provided to see if it’s useful for you.

      Hope that helps,

      Dave

  1. Dear Davo,

    many thanks for the code. I searched a lot for some examples about how to use pvclust with alternative distance measures. It seems many people are also having the same question, and it seems your code just hit the spot !

    All the best.
    Tiago

    1. Hi Tiago,

      Yeah the code was kindly written for me when I emailed the author of pvclust, so all credits to him.

      But glad you could find it and that it was helpful.

      Cheers,

      Dave

  2. Hi, I wanted to use this function but when I run this code I got an error.
    Erreur in as.character(x) :
    cannot coerce type ‘closure’ to vector of type ‘character’
    Thank you

    1. Hi Patricia,

      are you following the example in the post, or using your own data? It’s a bit hard to troubleshoot just from the error.

      Cheers,

      Dave

      1. Hi Dave,

        So I am using my own data,

        when I run the commands to create the function I don’t get an error but just after when I run the pvclust code.

        Best regards.

        Patricia

        1. Hi Patricia,

          without knowing the exact commands you ran and without a copy of your data, it’s quite hard to know what is wrong.

          Cheers,

          Dave

  3. Hi,
    when I run the code above I get the same error:
    Error in as.character(x) :
    cannot coerce type ‘closure’ to vector of type ‘character’

    This happens with the example data as well as with my own data.

    R version 3.0.2 (2013-09-25)
    pvclust_1.2-2

    1. I found the reason – or at least I now can run the example:
      I also downloaded the 2 R scripts from the dropbox urls and sourced both files. Using this functions I was able tun run the examples.
      It seems that there are differences in the code from tha package I used (pvclust_1.2-2) and this code from dropbox.
      Strange

      1. Yes the code from DropBox is the unofficial code (from the same author of pvclust); use that code if you want Spearman’s correlation instead of Pearson.

  4. I tested this script and it is working without errors.
    However the result is wrong while I am using exactly the same data:
    au bp se.au se.bp v c pchi
    1 1 1 0 0 0 0 0
    2 1 1 0 0 0 0 0
    3 1 1 0 0 0 0 0
    4 1 1 0 0 0 0 0
    5 1 1 0 0 0 0 0
    Any idea what the reason might be? Is it possible that this is related to my R-version (R3.1.3.)?

    1. I’m not sure what steps you performed or how you came to the conclusion that the result is incorrect.

      1. If I use exactly the same script, shouldn’t I have the same outcome, or at least more or less the same outcome?

        1. Oh right, my bad; didn’t realise you meant you followed the code exactly. You’re right; the results on this page are wrong. I must have copied and pasted the wrong results, since the dendrogram values are all 100%. I updated the results; thanks for letting me know!

  5. Hello all, this seems to be most promising post online for using pvclust() in R. I may just be getting too frustrated and not seeing a simple correction, but I keep getting this error as well:

    Error in cor(x, method = “pearson”, use = use.cor) : ‘x’ must be numeric

    I am using my own data. It has the following columns: Individual ID, followed by 4 criteria columns that I used in clustering (i.e. # of sightings, # of gap events, etc.). I have conducted 2 hierarchical cluster analyses: (1) complete linkage, (2) average linkage. And now i want to use pvclust() to test the uncertainty in the clustering (2 cluster for complete, 3 for average). I have tried to replicate the code above, and have the following:

    result <- pvclust(data, method.dist="correlation", method.hclust="average", nboot=1000, r=seq(0.7,1.4,by=.1)) #error right away
    plot(result)
    pvrect(result, alpha=0.95)

    I think the issue is my "data". Do I use the original data (with the 5 columns i stated above?), or do i make a data.frame? I also tried first making as.matrix(data), and then putting that as x in pvclust(x…). But i still get an error.

    Any help would be great!! Thanks so much!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.