R data visualisation

Once upon a time, I made my graphs using Excel because it was the only software that I was aware of for making graphs. Now one can do amazing things with Excel and produce fairly good looking graphs, but after looking at some examples of R graphs, I wanted to learn a bit more about R data visualisation. This post mentions some new plotting functions I recently discovered and have found useful.

Using identify()

This lecture on R data visualisation provides a very nice introduction to producing plots in R, which I highly recommend, so check it out. One really cool trick I learned from the course was the use of identify(), which let's you identify points in a scatter plot. Imagine you were comparing the expression of genes from two different libraries and you wanted to know which gene a particular dot represented. Here's how:

#I will use a file in the DESeq package
#where are system files for DESeq stored
[1] "C:/Program Files/R/R-3.0.1/library/DESeq"
deseq_path <- system.file(package="DESeq")
deseq_file <- paste(deseq_path, '/extra/TagSeqExample.tab', sep='')
#load TagSeqExample.tab
data <- read.table(deseq_file, header=T, row.names=1)
           T1a T1b  T2  T3  N1  N2
Gene_00001   0   0   2   0   0   1
Gene_00002  20   8  12   5  19  26
Gene_00003   3   0   2   0   0   0
Gene_00004  75  84 241 149 271 257
Gene_00005  10  16   4   0   4  10
Gene_00006 129 126 451 223 243 149
#make a scatter plot of sample T2 vs. N2
plot(log2(data$T2), log2(data$N2), pch=19, xlab="Log2 expression library T2", ylab="Log2 expression library N2")

deseq_tag_seq_example_t2_n2_scatterNot so informative.

Now if you wanted to quickly identify some genes that are up in the tumour sample (T2) and up in the normal sample (N2), you can use the identify() function:

plot(log2(data$T2), log2(data$N2), pch=19, xlab="Log2 expression library T2", ylab="Log2 expression library N2")
gene_of_interest <- identify(log2(data$T2), log2(data$N2), labels=row.names(data))
#now click on the dots you're interested in
#and push escape once you've finished clicking
#I clicked on four dots, which are now stored in gene_of_interest
[1]   743  4262 12423 17550
[1] "Gene_00743" "Gene_04262" "Gene_12423" "Gene_17550"

deseq_tag_seq_example_t2_n2_scatter_labelledYes I always knew Gene_12423 was implicated in cancer!

I thought that was very cool and handy to know. In the past I was using ggobi to help identify dots.


We could also make a heatmap with the expression data above:

#create a smaller subset for illustrative purposes
data_subset <- data[rowSums(data)>50000,]
[1] 49
data_matrix <- data.matrix(data_subset)

deseq_tag_seq_example3The scale="row" parameter standardises the rows.


Now I just learned about the tabplot package and the tableplot() function from here. I will use the iris and diamond dataset to demonstrate the tableplot() function.

#how the iris dataset looks
row.sample <- function(dta, rep) {
   dta[sample(1:nrow(dta), rep, replace=FALSE), ] 
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
18           5.1         3.5          1.4         0.3     setosa
71           5.9         3.2          4.8         1.8 versicolor
83           5.8         2.7          3.9         1.2 versicolor
133          6.4         2.8          5.6         2.2  virginica
21           5.4         3.4          1.7         0.2     setosa
144          6.8         3.2          5.9         2.3  virginica
tableplot(iris, sortCol="Species")

iris_tableplotWe can clearly see that the setosa species has much shorter petals.

Using tableplot() on the diamonds dataset (Prices of 50,000 round cut diamonds) available in ggplot2 (as in the blog I linked above):

#run ?diamonds for more information on the dataset
#sort by depth
tableplot(diamonds, sortCol=depth)

tableplot_diamond_depthIt seems the ideal cut is related to the depth of the diamond. I don't usually buy diamonds, so I don't know much about them but here's a nice article explaining diamond depth, which may come in handy someday.

Conditioning plots: coplot()

I was going through the R Cookbook and the graphics section and learned about conditioning plots, which are useful for visually finding relationships between two numerical variables for a categorical variable (which has two values). I tried looking at a categorical variable with three possible values, but the scatter plots were split into two sections.

#I'll use the example in the R cookbook
data(Cars93, package="MASS")
coplot(Horsepower ~ MPG.city | Origin, data=Cars93)

cars93_horsepower_mileage_vs_origin~300 horsepower cars are only available in USA and cars with 35 miles per gallon or higher are only available outside of USA.

#coplot using the iris
coplot(Petal.Length ~ Petal.Width | Species, data=iris, rows=1)



I know this post is way too short to do justice to the R data visualisation title. But do have a look the lecture I linked right as the start of this post, and check out this quick introduction to ggplot2 and Quick-R, if you are interested in using R as a data visualisation tool.

Print Friendly, PDF & Email

Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.