Miscellaneous plots in R

The R Graphics Cookbook is an awesome book; it’s so awesome that I bought the ebook after I bought the hardcopy because one copy of it wasn’t enough. I haven’t read the book in its entirety yet, but I thought I’ll share with you some of the recipes in Chapter 13, which illustrates how to create miscellaneous plots in R.

Visualising a correlation matrix

I’ve written a post on creating a correlation matrix using R and showed how that can be visualised as a graph. From the cookbook I learned about the corrplot package, which creates a rather nice visualisation of correlations.

install.packages("corrplot")
library(corrplot)

# I will use the same example in my correlation post
# create random matrix with numbers ranging from 1 to 100
set.seed(12345)
A <- matrix(runif(100,1,100),nrow=10,ncol=10,byrow=T)
correlation_matrix <- cor(t(A), method="spearman")
corrplot(correlation_matrix)

corrplot

Displaying the coefficients:

col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(correlation_matrix,
         method="shade", # visualisation method
         shade.col=NA, # colour of shade line
         tl.col="black", # colour of text label
         tl.srt=45, # text label rotation
         col=col(200), # colour of glyphs
         addCoef.col="black", # colour of coefficients
         order="AOE" # ordering method
         )

corrplot2

Plotting a function

I’ve written a post on plotting curves in R, which was describing how one could plot polynomial functions. Here I’ll illustrate how we can plot these functions using ggplot2 as described in the cookbook.

# install if necessary
install.packages("ggplot2")
library(ggplot2)

# I will use the same example in my curve fitting post
x <- c(32,64,96,118,126,144,152.5,158)
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)

# fit a third order polynomial
fit3 <- lm(y~poly(x,3,raw=TRUE))

# display coefficients
coef(fit3)
            (Intercept) poly(x, 3, raw = TRUE)1 poly(x, 3, raw = TRUE)2 
           1.269381e+02           -1.626321e+00            2.910221e-02 
poly(x, 3, raw = TRUE)3 
          -1.467589e-04

# create function for this third order polynomial
# recall that y = d + cx + bx^2 + ax^3
my_fun <- function(x){
  coef(fit3)[1] +
    (coef(fit3)[2] * x) +
    (coef(fit3)[3] * x^2) +
    (coef(fit3)[4] * x^3)
}

ggplot(data.frame(x=c(30, 160)),
       aes(x=x)) +
  stat_function(fun=my_fun)

curve

How do we colour a specific region of the curve?

# remember to run the code above
# create another function that returns
# NA within a range
my_fun_limit <- function(x){
  y <- coef(fit3)[1] +
    (coef(fit3)[2] * x) +
    (coef(fit3)[3] * x^2) +
    (coef(fit3)[4] * x^3)
  y[x<80 | x>120]<-NA
  return(y)
}

p <- ggplot(data.frame(x=c(30, 160)), aes(x=x))
p + stat_function(fun=my_fun_limit, geom="area", fill="blue", alpha=0.2) +
  stat_function(fun=my_fun)

curve_colour

Creating a heatmap using ggplot2

I’ve written a post on creating heatmaps with R but I’ve never used ggplot2 for creating a heatmap.

# using the matrix in my heatmap post
set.seed(31)
y <- matrix(rnorm(50),
            10,
            5,
            dimnames=list(paste("g", 1:10, sep=""),
                          paste("t", 1:5, sep=""))
            )

# data needs to be in long format
# install reshape if necessary
install.packages("reshape")
library(reshape)
yy <- melt(y)

ggplot(yy, aes(x=X1, y=X2, fill=value)) +
  geom_raster() +
  scale_fill_gradient2(midpoint=0, mid="grey70", limits=c(-2,2))

heatmap

Plotting an empirical cumulative distribution

Plotting a cumulative distribution is particularly useful for visualising quantiles and ggplot2 makes it extremely easy to produce this plot.

# I will use an example expression dataset
# from the DESeq package
#install DESeq if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("DESeq")
#load DESeq
library("DESeq")

example_file <- system.file ("extra/TagSeqExample.tab", package="DESeq")
data <- read.delim(example_file, header=T, row.names="gene")

# summary of expression values in the T1a sample
summary(data$T1a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0     2.0    23.0   146.9    90.0 32310.0

ggplot(data, aes(x=log2(T1a))) +
  geom_hline(yintercept=0.5) +
  stat_ecdf()

ecdNote that half of the data is around 2^4.5 (22.62742); from the summary we saw that the median was 23. From this we can conclude that most genes are lowly expressed..

Summary

There were other examples in the chapter including how to create maps, dendrograms, 3D plots and animating a 3D plot (which I have also written about). I can see myself using the correlation and cumulative plot a lot.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.