Matrix to adjacency list in R

An adjacency list is simply an unordered list that describes connections between vertices. It's a commonly used input format for graphs. In this post, I use the melt() function from the reshape2 package to create an adjacency list from a correlation matrix. I use the geneData dataset, which consists of real but anonymised microarray expression data, from the Biobase package as an example. Finally, I'll show some features of the igraph package.

Continue reading

Plotting error bars with R

Error bars may show confidence intervals, standard errors, and standard deviations. Each feature conveys a different message and this paper on error bars in experimental biology explains it very nicely. For this post I will demonstrate how to plot error bars that show the standard error (SE) or standard error of the mean (SEM). I found two nice resources that demonstrate the plotting of error bars with R and in this post I illustrate them with simple examples. The first method is from the website of James Holland Jones, where he wrote an R function that plots arrows to a bar plot.

Continue reading

Markov clustering

The Markov Cluster (MCL) Algorithm is an unsupervised cluster algorithm for graphs based on simulation of stochastic flow in graphs. Markov clustering was the work of Stijn van Dongen and you can read his thesis on the Markov Cluster Algorithm. The work is based on the graph clustering paradigm, which postulates that natural groups in graphs (something we aim to look for) have the following property:

A random walk in G that visits a dense cluster will likely not leave the cluster until many of its vertices have been visited.

Continue reading

Making a line graph to depict timecourse data

From this helpful thread in the bioconductor mailing list.

x <- 1:50  ## these would be your genes
 set.seed(1)
 y <- matrix(rnorm(1e4), nc=200) ## this would be your gene expr matrix
 col <- rgb(190, 190, 190, alpha=60, maxColorValue=255)
 matplot(x, y, type='l', col=col)

Just to see what it is doing, I made a simpler example

#variable with 2 rows
one <- 1:2
#matrix with 2 rows and 10 columns
two <- matrix(rnorm(20), nc=10)
two
#           [,1]        [,2]      [,3]        [,4]       [,5]      [,6]      [,7]
#[1,] -0.6822078 -0.25108283  2.425193 -0.05766436 -2.6879801 0.3658529 -1.987125
#[2,] -1.0733665 -0.01278288 -1.403296 -0.13803471 -0.5859938 0.4553595 -1.395202
#          [,8]      [,9]      [,10]
#[1,] -0.281798 1.8890190  1.0734526
#[2,] -1.498866 0.1831983 -0.4870297
#plots each column in the matrix two, along x
matplot(one,two,type='l')

Column 5 of the matrix "two" can most easily be seen as the dotted aqua line (from -2.6879801 to -0.5859938).

This plot could be useful if you wanted to depict the gene expression of 50 genes at 10 timepoints in a timecourse experiment (make a matrix of 10 rows by 50 columns).

Making a line chart with non-numerical x axis

Basic example of creating a line chart with user defined x axis values using R.

opar=par(ps=18)
label = c('no_filter',9,8,7,6,5,4)
a <- c("0.4682953","0.466284","0.4587435","0.4095376","0.4444738","0.7144069","1.105043")
b <- c("0.9562088","0.953856","0.9104818","0.7554028","0.64136","0.877509","1.125698")
c <- c("0.7536005","0.7487367","0.7200604","0.6408311","0.5488365","0.6355055","1.051849")
d <- c("0.6601285","0.6566467","0.623516","0.5532256","0.5434039","0.6835916","1.047395")
e <- c("0.7536913","0.7511848","0.7338917","0.6548796","0.5129727","0.6585963","0.9883826")
f <- c("0.5596907","0.5595791","0.5512355","0.5178115","0.5014316","0.5900139","0.9123776")
g <- c("0.4868574","0.4866527","0.4776274","0.4359562","0.3950309","0.5714427","1.190739")
plot(a,axes=F,xlab="",ylab="",type="b",col="red")
lines(b,type="b",col="orange")
lines(c,type="b",col="yellow")
lines(d,type="b",col="green")
lines(e,type="b",col="blue")
lines(f,type="b",col="purple")
lines(g,type="b",col="violet")
axis(2)
axis(1,at=1:length(label),labels=label)
title(main = "main", xlab="xlab", ylab = "ylab")
legend(4,1.1,c("a","b","c","d","e","f","g"),col=c("red","orange","yellow","green","blue","purple","violet"),lty=c(1,1,1,1,1,1,1),lwd=c(1,1,1,1,1,1,1))

opar=par(ps=18)
label = c('no_filter',9,8,7,6,5)
data = read.table("file.tsv",header=F,sep="\t")
data = data[,-1]
a = as.vector(t(data[1,]))
b = as.vector(t(data[2,]))
c = as.vector(t(data[3,]))
d = as.vector(t(data[4,]))
e = as.vector(t(data[5,]))
f = as.vector(t(data[6,]))
g = as.vector(t(data[7,]))
h = as.vector(t(data[8,]))
i = as.vector(t(data[9,]))
j = as.vector(t(data[10,]))
k = as.vector(t(data[11,]))
l = as.vector(t(data[12,]))
m = as.vector(t(data[13,]))
n = as.vector(t(data[14,]))
range(as.vector(t(data))) #get the range
yrange = c(0.2,0.2,0.2,0.2,0.2,0.7)
plot(yrange,type="n",axes=F,ylab="",xlab="")
lines(a,type="b")
lines(b,type="b")
lines(c,type="b")
lines(d,type="b")
lines(e,type="b")
lines(f,type="b")
lines(g,type="b")
lines(h,type="b")
lines(i,type="b")
lines(j,type="b")
lines(k,type="b")
lines(l,type="b")
lines(m,type="b")
lines(n,type="b")
axis(2)
axis(1,at=1:length(label),labels=label)
title(main = "main", xlab="xlab", ylab = "ylab")