Creating a correlation matrix with R

Updated 2024 April 7th Incentive Let be a matrix, where are elements of , where is the row and is the column. If the matrix contained transcript expression data, then is the expression level of the transcript in the assay. The elements of the row of form the transcriptional response of the transcript. The elements…

Using R to obtain basic statistics on your dataset

Updated: 2014 June 20th Most of the data I work with are represented as tables i.e. with rows and columns. R makes it easy to store (as data frames) and process such data to produce some basic statistics. Here are just some R functions that calculate some basic, but nevertheless useful, statistics. I will use…

edgeR vs. SAMSeq

A while ago I received a comment on comparing parametric methods against nonparametric for calling differential expression in count data. Here I compare SAMSeq (Jun Li and Robert Tibshirani (2011) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, in press.) with edgeR. For more information…

The DGEList object in R

I’ve updated this post (2013 June 29th) to use the latest version of R, Bioconductor and edgeR. I also demonstrate how results of edgeR can be saved and outputted into one useful table. The DGEList object holds the dataset to be analysed by edgeR and the subsequent calculations performed on the dataset. Specifically it contains:…

Processing rows of a data frame in R

Once you’ve read in a tab delimited file into a data.frame, here’s one way of operating on the rows I’m still wondering why I need two conversion steps ( e.g. var(as.vector(as.matrix(data_subset[1,]))) ), since var(as.vector(data_subset[1,])) doesn’t work. In time, when I learn more about data.frames and R in general I hope to address this or if…

Creating data subsets in R

Say you have a tab delimited file called tally.tsv with n rows and you only want to work with a subset of n based on the sum of each row. Here’s how to do it within R: