Tag Archives: DGE

Variance in RNA-Seq data

Using data from this paper. Generate some random data from Poisson distribution

Posted in bioinformatics, R, Statistics | Tagged , | Leave a comment

DESeq vs. edgeR vs. baySeq using pnas_expression.txt

Following the instructions from a previous post, I filtered the pnas_expression.txt dataset and saved the results in “pnas_expression_filtered.tsv” and then performed the differential gene expression analyses using the respective packages. To run the Perl scripts below, just save the code … Continue reading

Posted in bioinformatics, R | Tagged , | 4 Comments

edgeR vs. SAMSeq

A while ago I received a comment on comparing parametric methods against nonparametric for calling differential expression in count data. Here I compare SAMSeq (Jun Li and Robert Tibshirani (2011) Finding consistent patterns: a nonparametric approach for identifying differential expression … Continue reading

Posted in bioinformatics, R | Tagged , | Leave a comment

The DGEList object in R

The DGEList object holds the dataset to be analysed by edgeR and the subsequent calculations performed on the dataset. Specifically it contains: counts numeric matrix containing the read counts. lib.size numeric vector containing the total to normalize against for each … Continue reading

Posted in R | Tagged , | Leave a comment

edgeR vs. DESeq using pnas_expression.txt

Firstly from Davis’s homepage download the file pnas_expression.txt. For more information on the dataset please refer to the edgeR manual. The latest R version at the time of writing is R 2.13.1. You can download it from here. Install bioconductor … Continue reading

Posted in R, Statistics | Tagged , | 6 Comments

edgeR’s common dispersion

Please refer to this thread for an explanation of the common dispersion value in edgeR. Script to prepare R script I made a file with 1,000 elements with 4 replicates with exactly the same counts per element: id one two … Continue reading

Posted in R, Statistics | Tagged , | Leave a comment

Normalisation methods for DGE data

So I created a dataset with 4 samples; column 1 and 2 are the controls and column 3 and 4 are the patients. 25 transcripts are in all 4 samples in equal amount. Another 25 transcripts are only in the … Continue reading

Posted in bioinformatics, R | Tagged | Leave a comment

DESeq vs. edgeR vs. baySeq

6th April 2012: For a more updated version of this post, please refer see this post. A very simple comparison Using the TagSeqExample.tab file from the DESeq package as the benchmark dataset. According to DESeq authors, T1a and T1b are … Continue reading

Posted in bioinformatics, R | Tagged , , , | 22 Comments

DESeq

Code taken from the DESeq vignette for my own convenience. library(“DESeq”) exampleFile = system.file (“extra/TagSeqExample.tab”,package=”DESeq”) countsTable = read.delim(exampleFile, header=TRUE, stringsAsFactors=TRUE) rownames(countsTable) = countsTable$gene countsTable = countsTable[ , -1] conds = c(“T”,”T”,”T”,”Tb”,”N”,”N”) cds = newCountDataSet (countsTable, conds) cds = cds[,-1] cds … Continue reading

Posted in R | Tagged , | Leave a comment

Pooling technical replicates in edgeR

4 libraries each with technical replicates and 2 conditions. Technical replicates are the same samples performed identically. First treat technical replicates separately: >library(edgeR) >targets = read.delim(file = “targets.txt”, stringsAsFactors = FALSE) >d = readDGE(targets,comment.char=”#”) >d = estimateCommonDisp(d) >d$samples$lib.size [1] 258680 … Continue reading

Posted in R | Tagged | 1 Comment