Once upon a time, I made my graphs using Excel because it was the only software that I was aware of for making graphs. Now one can do amazing things with Excel and produce fairly good looking graphs, but after looking at some examples of R graphs, I wanted to learn a bit more about R data visualisation. This post mentions some new plotting functions I recently discovered and have found useful.
Using identify()
This lecture on R data visualisation provides a very nice introduction to producing plots in R, which I highly recommend, so check it out. One really cool trick I learned from the course was the use of identify(), which let's you identify points in a scatter plot. Imagine you were comparing the expression of genes from two different libraries and you wanted to know which gene a particular dot represented. Here's how:
#I will use a file in the DESeq package library(DESeq) #where are system files for DESeq stored system.file(package="DESeq") [1] "C:/Program Files/R/R-3.0.1/library/DESeq" deseq_path <- system.file(package="DESeq") deseq_file <- paste(deseq_path, '/extra/TagSeqExample.tab', sep='') #load TagSeqExample.tab data <- read.table(deseq_file, header=T, row.names=1) head(data) T1a T1b T2 T3 N1 N2 Gene_00001 0 0 2 0 0 1 Gene_00002 20 8 12 5 19 26 Gene_00003 3 0 2 0 0 0 Gene_00004 75 84 241 149 271 257 Gene_00005 10 16 4 0 4 10 Gene_00006 129 126 451 223 243 149 #make a scatter plot of sample T2 vs. N2 plot(log2(data$T2), log2(data$N2), pch=19, xlab="Log2 expression library T2", ylab="Log2 expression library N2")
Now if you wanted to quickly identify some genes that are up in the tumour sample (T2) and up in the normal sample (N2), you can use the identify() function:
plot(log2(data$T2), log2(data$N2), pch=19, xlab="Log2 expression library T2", ylab="Log2 expression library N2") gene_of_interest <- identify(log2(data$T2), log2(data$N2), labels=row.names(data)) #now click on the dots you're interested in #and push escape once you've finished clicking #I clicked on four dots, which are now stored in gene_of_interest gene_of_interest [1] 743 4262 12423 17550 row.names(data[gene_of_interest,]) [1] "Gene_00743" "Gene_04262" "Gene_12423" "Gene_17550"
Yes I always knew Gene_12423 was implicated in cancer!
I thought that was very cool and handy to know. In the past I was using ggobi to help identify dots.
Heatmaps
We could also make a heatmap with the expression data above:
install.packages("gplots") library(gplots) #create a smaller subset for illustrative purposes data_subset <- data[rowSums(data)>50000,] nrow(data_subset) [1] 49 data_matrix <- data.matrix(data_subset) heatmap.2(data_matrix,scale="row")
The scale="row" parameter standardises the rows.
tabplot()
Now I just learned about the tabplot package and the tableplot() function from here. I will use the iris and diamond dataset to demonstrate the tableplot() function.
install.packages("tabplot") library(tabplot) #how the iris dataset looks row.sample <- function(dta, rep) { dta[sample(1:nrow(dta), rep, replace=FALSE), ] } head(row.sample(iris)) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 18 5.1 3.5 1.4 0.3 setosa 71 5.9 3.2 4.8 1.8 versicolor 83 5.8 2.7 3.9 1.2 versicolor 133 6.4 2.8 5.6 2.2 virginica 21 5.4 3.4 1.7 0.2 setosa 144 6.8 3.2 5.9 2.3 virginica tableplot(iris, sortCol="Species")
We can clearly see that the setosa species has much shorter petals.
Using tableplot() on the diamonds dataset (Prices of 50,000 round cut diamonds) available in ggplot2 (as in the blog I linked above):
library(ggplot2) data(diamonds) #run ?diamonds for more information on the dataset tableplot(diamonds) #sort by depth tableplot(diamonds, sortCol=depth)
It seems the ideal cut is related to the depth of the diamond. I don't usually buy diamonds, so I don't know much about them but here's a nice article explaining diamond depth, which may come in handy someday.
Conditioning plots: coplot()
I was going through the R Cookbook and the graphics section and learned about conditioning plots, which are useful for visually finding relationships between two numerical variables for a categorical variable (which has two values). I tried looking at a categorical variable with three possible values, but the scatter plots were split into two sections.
#I'll use the example in the R cookbook data(Cars93, package="MASS") coplot(Horsepower ~ MPG.city | Origin, data=Cars93)
~300 horsepower cars are only available in USA and cars with 35 miles per gallon or higher are only available outside of USA.
#coplot using the iris coplot(Petal.Length ~ Petal.Width | Species, data=iris, rows=1)
Conclusions
I know this post is way too short to do justice to the R data visualisation title. But do have a look the lecture I linked right as the start of this post, and check out this quick introduction to ggplot2 and Quick-R, if you are interested in using R as a data visualisation tool.

This work is licensed under a Creative Commons
Attribution 4.0 International License.