# edgeR vs. SAMSeq

A while ago I received a comment on comparing parametric methods against nonparametric for calling differential expression in count data. Here I compare SAMSeq (Jun Li and Robert Tibshirani (2011) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, in press.) with edgeR. For more information on SAMSeq refer to the paper. I intend to look at cuffdiff sometime later.

library(edgeR)
d <- raw.data[,1:7]
rownames(d) <- rownames(raw.data)
#                lane1 lane2 lane3 lane4 lane5 lane6 lane8
#ENSG00000124208   478   619   628   744   483   716   240
#ENSG00000182463    27    20    27    26    48    55    24
#ENSG00000124201   180   218   293   275   373   301    88
#ENSG00000124207    76    80    85    97    80    81    37
#ENSG00000125835   132   200   200   228   280   204    52
#ENSG00000125834    42    60    72    86   131    99    30
group <- c(rep("Control",4),rep("DHT",3))
d <- DGEList(counts = d, group=group)
#Calculating library sizes from column totals.
d <- calcNormFactors(d)
d <- estimateCommonDisp(d)
de.com <- exactTest(d)
#Comparison of groups:  DHT - Control
summary(decideTestsDGE(de.com,p.value=0.05))
#   [,1]
#-1  2368
#0  11715
#1   2411
d <- estimateTagwiseDisp(d, prop.used=0.5, grid.length=500)
de.tgw <- exactTest(d)
summary(decideTestsDGE(de.tgw, p.value=0.05))
#   [,1]
#-1  2119
#0  12048
#1   2327
#total 4446
de <- summary(decideTestsDGE(de.tgw,p.value=0.05))
total <- de[1] + de[3]
sink("edger.output")
topTags(de.tgw,n=total)
sink()
sessionInfo()
#R version 2.14.1 (2011-12-22)
#Platform: x86_64-unknown-linux-gnu (64-bit)
#
#locale:
#[1] C
#
#attached base packages:
#[1] stats     graphics  grDevices utils     datasets  methods   base
#
#other attached packages:
#[1] edgeR_2.4.1
#
#loaded via a namespace (and not attached):
#[1] limma_3.10.0


For SAMSeq:

 Data type? Two class unpaired Array or Seq data? seq Arrays centered? FALSE Delta 2.21025 Minimum fold change 0 Test statistic standard Are data are log scale? FALSE Number of permutations 1000 Number of neighbors for KNN 10 Seed for Random number generator 1234567 Estimate of pi0 (proportion of null genes) 0.55 Exchangibility factor s0 0 s0 percentile 0 Quantiles of estimated sequencing depths (0%, 25%, 50%, 75%, 100%) 0.474, 0.822, 1.116, 1.228, 1.311 False Discovery Rate (%) 3.228881

Of the 4,408 significantly differentiated genes detected by SAMSeq using the settings above, 3,908 overlap the 4,446 detected by edgeR using a tagwise dispersion.

and

http://www.stanford.edu/~junli07/research.html#SAM

.
1. Fatemeh says:

Dear Dave,
I have asked some questions regarding samseq through email and you kindly help.
Just an issue about the samseq results:
This is the command I ran:
SAMseq(x = counts, y = group, resp.type = “Two class unpaired”,
nperms = 1000, random.seed = 1234567, fdr.output = 0.05)
as you see I have chose the FDR = 0.05 but I do not know why in the output I have something like this:
> samseqThre1[[1]][[4]]
\$genes.up
Gene ID Gene Name Score(d) Fold Change q-value(%)
[1,] “g94” “94” “8” “1.626” “16.76”
[2,] “g1242” “1242” “8” “27.261” “16.76”
[3,] “g1295” “1295” “8” “1.511” “16.76”
[4,] “g1365” “1365” “8” “1.465” “16.76”
[5,] “g1487” “1487” “8” “2.815” “16.76”
….
….
as you can see the q value which is suppose to be the lowest FDR in samr is 16.76 % so it means that the lowest FDR regarding to these genes should be 0.1676 which is more than what I chose in my command(0.05). Do you have any hints or suggestions or maybe I am misunderstood.

1. Davo says:

Hi Fatemeh,

I think it would be best to email the original authors of this package. I’ve only used SAMSeq through Excel (very briefly) and not through R.

I found from this page http://www-stat.stanford.edu/~tibs/SAM/faq.html, that you should email sam-bug at stat.stanford.edu.

Hope that helps,

Dave

1. Kobi says:

Hi Fatmeh,

Did you got an answer to this question? I was asking myself the same thing

This site uses Akismet to reduce spam. Learn how your comment data is processed.