Hierarchical clustering with p-values

RDavoNovember 26, 201020

The code, which allowed me to use the Spearman's rank correlation coefficient, was kindly provided to me by the developer of pvclust.

Firstly download the unofficial package or just source it from my DropBox account. Start up R and follow:

#load the package
#if you are having trouble sourcing from https run
#install.packages('devtools')
#library(devtools)
#source_url("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust.R")
#source_url("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust-internal.R")

source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust.R")
source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust-internal.R")

#use a test dataset from DESeq
#install DESeq if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("DESeq")
#load DESeq
library("DESeq")
example_file <- system.file ("extra/TagSeqExample.tab", package="DESeq")
data <- read.delim(example_file, header=T, row.names="gene")
head(data)
           T1a T1b  T2  T3  N1  N2
Gene_00001   0   0   2   0   0   1
Gene_00002  20   8  12   5  19  26
Gene_00003   3   0   2   0   0   0
Gene_00004  75  84 241 149 271 257
Gene_00005  10  16   4   0   4  10
Gene_00006 129 126 451 223 243 149

# Define a distance function

spearman <- function(x, ...) {
    x <- as.matrix(x)
    res <- as.dist(1 - cor(x, method = "spearman", use = "everything"))
    res <- as.dist(res)
    attr(res, "method") <- "spearman"
    return(res)
}

result <- pvclust(data, method.dist=spearman, nboot=100)

result

Cluster method: average
Distance      : spearman

Estimates on edges:

  au bp se.au se.bp v c pchi
1  1  1     0     0 0 0    0
2  1  1     0     0 0 0    0
3  1  1     0     0 0 0    0
4  1  1     0     0 0 0    0
5  1  1     0     0 0 0    0

#pvclust classed object
class(result)
[1] "pvclust"

names(result)
[1] "hclust" "edges"  "count"  "msfit"  "nboot"  "r"      "store"

plot(result)
pvrect(result, alpha=0.95)

Hierarchical clustering with p-values.

This work is licensed under a Creative Commons
Attribution 4.0 International License.

20 comments Add yours

xyx says:

October 19, 2011 at 16:50

this is wrong, its giving error in R. the error is something like this x not in data frame

Reply
1. Davo says:
  
  October 19, 2011 at 23:36
  
  Thanks for letting me know; the code got mangled up; try again now.
  
  Reply
Alex says:

July 16, 2013 at 18:37

what does “level2_for_paper_norm_id.tsv” look like? I’d like to try this.

Reply
1. Davo says:
  
  July 17, 2013 at 04:52
  
  Hi Alex,
  
  The file was a tab delimited file, where the rows are the genes and the columns the different samples.
  
  I’ve updated this post to use a dataset that is available from the DESeq package. You can try the example I’ve provided to see if it’s useful for you.
  
  Hope that helps,
  
  Dave
  
  Reply
Tiago Lopes says:

August 20, 2013 at 05:57

Dear Davo,

many thanks for the code. I searched a lot for some examples about how to use pvclust with alternative distance measures. It seems many people are also having the same question, and it seems your code just hit the spot !

All the best.
Tiago

Reply
1. Davo says:
  
  August 20, 2013 at 09:15
  
  Hi Tiago,
  
  Yeah the code was kindly written for me when I emailed the author of pvclust, so all credits to him.
  
  But glad you could find it and that it was helpful.
  
  Cheers,
  
  Dave
  
  Reply
anushka says:

February 21, 2014 at 15:46

Thanks a lot. I was looking for exactly how make pvclust use the spearman’s correlation!

Reply
1. Davo says:
  
  February 21, 2014 at 16:34
  
  No problems. I merely just provided the code the author of pvclust wrote for me 🙂
  
  Reply
Patricia says:

April 10, 2014 at 12:24

Hi, I wanted to use this function but when I run this code I got an error.
Erreur in as.character(x) :
cannot coerce type ‘closure’ to vector of type ‘character’
Thank you

Reply
1. Davo says:
  
  April 10, 2014 at 14:12
  
  Hi Patricia,
  
  are you following the example in the post, or using your own data? It’s a bit hard to troubleshoot just from the error.
  
  Cheers,
  
  Dave
  
  Reply
  1. Patricia says:
    
    April 10, 2014 at 14:38
    
    Hi Dave,
    
    So I am using my own data,
    
    when I run the commands to create the function I don’t get an error but just after when I run the pvclust code.
    
    Best regards.
    
    Patricia
    
    Reply
    1. Davo says:
      
      April 11, 2014 at 01:51
      
      Hi Patricia,
      
      without knowing the exact commands you ran and without a copy of your data, it’s quite hard to know what is wrong.
      
      Cheers,
      
      Dave
      
      Reply
Steffen says:

May 28, 2014 at 13:44

Hi,
when I run the code above I get the same error:
Error in as.character(x) :
cannot coerce type ‘closure’ to vector of type ‘character’

This happens with the example data as well as with my own data.

R version 3.0.2 (2013-09-25)
pvclust_1.2-2

Reply
1. Steffen says:
  
  May 28, 2014 at 13:49
  
  I found the reason – or at least I now can run the example:
  I also downloaded the 2 R scripts from the dropbox urls and sourced both files. Using this functions I was able tun run the examples.
  It seems that there are differences in the code from tha package I used (pvclust_1.2-2) and this code from dropbox.
  Strange
  
  Reply
  1. Davo says:
    
    May 28, 2014 at 13:56
    
    Yes the code from DropBox is the unofficial code (from the same author of pvclust); use that code if you want Spearman’s correlation instead of Pearson.
    
    Reply
jedgroev says:

March 23, 2015 at 15:23

I tested this script and it is working without errors.
However the result is wrong while I am using exactly the same data:
au bp se.au se.bp v c pchi
1 1 1 0 0 0 0 0
2 1 1 0 0 0 0 0
3 1 1 0 0 0 0 0
4 1 1 0 0 0 0 0
5 1 1 0 0 0 0 0
Any idea what the reason might be? Is it possible that this is related to my R-version (R3.1.3.)?

Reply
1. Davo says:
  
  March 24, 2015 at 04:51
  
  I’m not sure what steps you performed or how you came to the conclusion that the result is incorrect.
  
  Reply
  1. jedgroev says:
    
    March 24, 2015 at 11:49
    
    If I use exactly the same script, shouldn’t I have the same outcome, or at least more or less the same outcome?
    
    Reply
    1. Davo says:
      
      March 24, 2015 at 11:58
      
      Oh right, my bad; didn’t realise you meant you followed the code exactly. You’re right; the results on this page are wrong. I must have copied and pasted the wrong results, since the dendrogram values are all 100%. I updated the results; thanks for letting me know!
      
      Reply
S Jav says:

March 27, 2016 at 02:15

Hello all, this seems to be most promising post online for using pvclust() in R. I may just be getting too frustrated and not seeing a simple correction, but I keep getting this error as well:

Error in cor(x, method = “pearson”, use = use.cor) : ‘x’ must be numeric

I am using my own data. It has the following columns: Individual ID, followed by 4 criteria columns that I used in clustering (i.e. # of sightings, # of gap events, etc.). I have conducted 2 hierarchical cluster analyses: (1) complete linkage, (2) average linkage. And now i want to use pvclust() to test the uncertainty in the clustering (2 cluster for complete, 3 for average). I have tried to replicate the code above, and have the following:

result <- pvclust(data, method.dist="correlation", method.hclust="average", nboot=1000, r=seq(0.7,1.4,by=.1)) #error right away
plot(result)
pvrect(result, alpha=0.95)

I think the issue is my "data". Do I use the original data (with the 5 columns i stated above?), or do i make a data.frame? I also tried first making as.matrix(data), and then putting that as x in pvclust(x…). But i still get an error.

Any help would be great!! Thanks so much!

Reply

Leave a Reply Cancel reply