Making a heatmap in R with the pheatmap package

For a while, heatmap.2() from the gplots package was my function of choice for creating heatmaps in R. Then I discovered the superheat package, which attracted me because of the side plots. However, shortly afterwards I discovered pheatmap and I have been mainly using it for all my heatmaps (except when I need to interact with the heatmap; for that I use d3heatmap).

The one feature of pheatmap that I like the most is the ability to add annotations to the rows and columns. To get started, you can install pheatmap if you haven’t already.


# load package

I will use the same dataset, from the DESeq package, as my original heatmap post.

# install DESeq if necessary

# load package

# load data and subset
example_file <- system.file ("extra/", package="DESeq")
data <- read.delim(example_file, header=T, row.names="gene")
data_subset <- as.matrix(data[rowSums(data)>50000,])

# create heatmap using pheatmap

The heatmap.2() function has a parameter for scaling the rows; this can be easily implemented.

cal_z_score <- function(x){
  (x - mean(x)) / sd(x)

data_subset_norm <- t(apply(data_subset, 1, cal_z_score))

Now before I demonstrate the main functionality that I like so much about pheatmap, which is creating annotations, we need to figure out how we would like to colour the rows. I’ll perform hierarchical clustering in the same manner as performed by pheatmap to obtain gene clusters. I use the excellent dendextend to plot a simple dendrogram.

my_hclust_gene <- hclust(dist(data_subset), method = "complete")

# install if necessary

# load package

as.dendrogram(my_hclust_gene) %>%
  plot(horiz = TRUE)

We can form two clusters of genes by cutting the tree with the cutree() function; we can either specific the height to cut the tree or the number of clusters we want.

my_gene_col <- cutree(tree = as.dendrogram(my_hclust_gene), k = 2)

Gene_00562 Gene_02115 Gene_02296 Gene_02420 Gene_02800 Gene_03194 Gene_03450 Gene_03852 Gene_03861 Gene_04164 
         1          2          2          2          2          2          2          2          2          2 
Gene_05004 Gene_05761 Gene_05960 Gene_06899 Gene_07132 Gene_07390 Gene_07404 Gene_08042 Gene_08576 Gene_08694 
         2          2          2          2          2          1          2          1          2          1 
Gene_08743 Gene_08819 Gene_09238 Gene_09505 Gene_09610 Gene_09952 Gene_09969 Gene_10648 Gene_10924 Gene_11002 
         1          1          1          2          2          2          2          2          2          2 
Gene_11672 Gene_12318 Gene_12576 Gene_12804 Gene_12834 Gene_12940 Gene_13444 Gene_13540 Gene_14450 Gene_14672 
         1          2          2          1          2          1          2          1          1          1 
Gene_14928 Gene_15286 Gene_15334 Gene_16114 Gene_16632 Gene_17743 Gene_17849 Gene_17865 Gene_17992 
         2          2          2          1          1          1          2          2          2

The 1’s and 2’s indicate the cluster that the genes belong to; I will rename them using the ifelse() function and create a data frame, which is the data structure needed by pheatmap for creating annotations.

my_gene_col <- data.frame(cluster = ifelse(test = my_gene_col == 1, yes = "cluster 1", no = "cluster 2"))

Gene_00562 cluster 1
Gene_02115 cluster 2
Gene_02296 cluster 2
Gene_02420 cluster 2
Gene_02800 cluster 2
Gene_03194 cluster 2

We can add multiple row annotations to a heatmap and below I’ll add some random annotations.

my_random <- as.factor(sample(x = 1:2, size = nrow(my_gene_col), replace = TRUE))
my_gene_col$random <- my_random

             cluster random
Gene_00562 cluster 1      2
Gene_02115 cluster 2      1
Gene_02296 cluster 2      1
Gene_02420 cluster 2      1
Gene_02800 cluster 2      2
Gene_03194 cluster 2      2

I’ll add some column annotations and create the heatmap.

my_sample_col <- data.frame(sample = rep(c("tumour", "normal"), c(4,2)))
row.names(my_sample_col) <- colnames(data_subset)

T1a tumour
T1b tumour
T2  tumour
T3  tumour
N1  normal
N2  normal

pheatmap(data_subset, annotation_row = my_gene_col, annotation_col = my_sample_col)

Another function that I like about pheatmap is the ability to introduce breaks in the heatmap. I’ll break up the heatmap by specifying how many clusters I want from the dendrograms. (You can also manually define where you want breaks too.)

         annotation_row = my_gene_col,
         annotation_col = my_sample_col,
         cutree_rows = 2,
         cutree_cols = 2)

Finally, I’ll demonstrate how you can retrieve the hierarchical clustering information using pheatmap. (I showed how you can manually perform the same hierarchical clustering as pheatmap in this post but if you didn’t, this step is handy.)

# use silent = TRUE to suppress the plot
my_heatmap <- pheatmap(data_subset, silent = TRUE)

# results are stored as a list
[1] "list"

[1] "tree_row" "tree_col" "kmeans"   "gtable"

my_heatmap$tree_row %>%
  as.dendrogram() %>%
  plot(horiz = TRUE)


There are various heatmap packages in R. I like pheatmap mainly because of its annotation feature; other heatmap packages may be able to do the same thing but I’m happy with pheatmap. Use ?pheatmap to find out more ways you can customise your heatmap.

Print Friendly, PDF & Email

Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.