Creating a correlation matrix with R

Updated 2014 January 6th This post on creating a correlation matrix with R was published in 2012 on January the 31st and has become one of the most viewed posts. I’ve learned a bit more since then, so I have updated and improved this post. Incentive Let $$A$$ be a $$m \times n$$ matrix, where…

Continue Reading

Using R to obtain basic statistics on your dataset

Updated: 2014 June 20th Most of the data I work with are represented as tables i.e. with rows and columns. R makes it easy to store (as data frames) and process such data to produce some basic statistics. Here are just some R functions that calculate some basic, but nevertheless useful, statistics. I will use…

Continue Reading

edgeR vs. SAMSeq

A while ago I received a comment on comparing parametric methods against nonparametric for calling differential expression in count data. Here I compare SAMSeq (Jun Li and Robert Tibshirani (2011) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, in press.) with edgeR. For more information…

Continue Reading

The DGEList object in R

I’ve updated this post (2013 June 29th) to use the latest version of R, Bioconductor and edgeR. I also demonstrate how results of edgeR can be saved and outputted into one useful table. The DGEList object holds the dataset to be analysed by edgeR and the subsequent calculations performed on the dataset. Specifically it contains:…

Continue Reading

Processing rows of a data frame in R

Once you’ve read in a tab delimited file into a data.frame, here’s one way of operating on the rows I’m still wondering why I need two conversion steps ( e.g. var(as.vector(as.matrix(data_subset[1,]))) ), since var(as.vector(data_subset[1,])) doesn’t work. In time, when I learn more about data.frames and R in general I hope to address this or if…

Continue Reading