I’ve written about hierarchical clustering before as an attempt to understand it better. Within R, you can plot the hierarchical clustering results however when working with a large dataset you may produce plots like these where all the labels are overlapping:
As you can see you can’t see any of the labels. During my honours year, I worked in a molecular evolution lab so I had learned about the Newick format, so I was wondering if there was a way to export the hierarchical clustering results into Newick format, so that I could use TreeView or some other tree visualising program to visualise or to manipulate the results. After some searching, I found this post from the Getting Genetics Done blog, which was exactly what I wanted.
So in almost the exact same instructions as the post I linked above:
#install if necessary source("http://bioconductor.org/biocLite.R") biocLite("ctc") library(ctc) #hierarchical clustering of my rows using average linkage hc <- hclust(d <- as.dist(1 - cor(t(data), method="pearson")), method="average") write.table(hc2Newick(hc), file="hc.newick")
Now open up hc.newick using your text editor of choice (e.g. Vim), and delete the first line, which should be “x” (with the double quotation marks). On the second line it should start as “1” ” (with the double quotation marks), remove those so that the beginning of the line is now an open parenthesis i.e. (. Lastly go to the end of this line and remove the double quotation mark i.e. “, so that the end of the line is a semicolon i.e. ;.
Open your tree viewing program of choice, for example Dendroscope and open the hc.newick file. I prefer this program over TreeView because it allows you to zoom right into the tree (mouse scrolling) and to copy node information to the clipboard (as well as many other features!). Within Dendroscope, you can try different types of layouts, such as a circular dendrogram:
Colouring the branches
I learned of the sparcl package from this Stack Overflow post. Here using the iris dataset, which comes with R, I colour the branches according to the species.
#install sparcl package install.packages("sparcl") #load package library(sparcl) names(iris)  "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" hh <- hclust(dist(iris[,c(1:4)])) table(iris$Species) setosa versicolor virginica 50 50 50 species_as_numeric = as.numeric(iris$Species) ColorDendrogram(hh,y=species_as_numeric,branchlength=2)
This work is licensed under a Creative Commons
Attribution 4.0 International License.