Variance in RNA-Seq data

Updated 2014 April 18th For this post I will use data from this study, that has been nicely summarised already to examine the variance in RNA-Seq data. Briefly, the study used LNCaP cells, which are androgen-sensitive human prostate adenocarcinoma cells, and treated the cells with DHT and with a mock treatment as the control. The…

Continue Reading

DESeq vs. edgeR vs. baySeq using pnas_expression.txt

Following the instructions from a previous post, I filtered the pnas_expression.txt dataset and saved the results in “pnas_expression_filtered.tsv” and then performed the differential gene expression analyses using the respective packages. To run the Perl scripts below, just save the code into a file and name it “something.pl”. Then make the file executable by running “chmod…

Continue Reading

edgeR vs. SAMSeq

A while ago I received a comment on comparing parametric methods against nonparametric for calling differential expression in count data. Here I compare SAMSeq (Jun Li and Robert Tibshirani (2011) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, in press.) with edgeR. For more information…

Continue Reading

The DGEList object in R

I’ve updated this post (2013 June 29th) to use the latest version of R, Bioconductor and edgeR. I also demonstrate how results of edgeR can be saved and outputted into one useful table. The DGEList object holds the dataset to be analysed by edgeR and the subsequent calculations performed on the dataset. Specifically it contains:…

Continue Reading

edgeR vs. DESeq using pnas_expression.txt

Firstly from Davis’s homepage download the file pnas_expression.txt. For more information on the dataset please refer to the edgeR manual and this paper. The latest R version at the time of writing is R 2.13.1. You can download it from here. Install bioconductor and the required packages: source(“http://www.bioconductor.org/biocLite.R”) biocLite() biocLite(“DESeq”) biocLite(“edgeR”) A filtering criteria of…

Continue Reading

edgeR’s common dispersion

Updated: 2017 September 7th When I was first learning about conducting a differential expression (DE) analysis with RNA-seq data, I found it very difficult to understand the statistical procedures implemented in various R packages that performed the DE analysis. This really bugged me. However, it was not difficult to carry out the analysis, since the…

Continue Reading

Normalisation methods implemented in edgeR

Updated 2024 August 5th edgeR carries out: Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types…

Continue Reading

DESeq vs. edgeR vs. baySeq

6th April 2012: For a more updated version of this post, please refer see this post. A very simple comparison Using the TagSeqExample.tab file from the DESeq package as the benchmark dataset. According to DESeq authors, T1a and T1b are similar, so I removed the second column in the file corresponding to T1a: Hierarchical clustering…

Continue Reading

DESeq

Code taken from the DESeq vignette for my own convenience. library(“DESeq”) exampleFile = system.file (“extra/TagSeqExample.tab”,package=”DESeq”) countsTable = read.delim(exampleFile, header=TRUE, stringsAsFactors=TRUE) rownames(countsTable) = countsTable$gene countsTable = countsTable[ , -1] conds = c(“T”,”T”,”T”,”Tb”,”N”,”N”) cds = newCountDataSet (countsTable, conds) cds = cds[,-1] cds = estimateSizeFactors(cds) sizeFactors(cds) cds <- estimateVarianceFunctions( cds ) res <- nbinomTest( cds, “N”, “T” )…

Continue Reading

Pooling technical replicates in edgeR

This post is very old and should just be ignored. But if you came across this, here’s a thread on the Bioconductor mailing list that may be relevant 4 libraries each with technical replicates and 2 conditions. Technical replicates are the same samples performed identically. First treat technical replicates separately: Now pooling everything together, so…

Continue Reading