A while ago I received a comment on comparing parametric methods against nonparametric for calling differential expression in count data. Here I compare SAMSeq (Jun Li and Robert Tibshirani (2011) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, in press.) with edgeR. For more information on SAMSeq refer to the paper. I intend to look at cuffdiff sometime later.
library(edgeR) raw.data <- read.delim("pnas_expression_filtered.tsv",row.names=1) d <- raw.data[,1:7] rownames(d) <- rownames(raw.data) head(d) # lane1 lane2 lane3 lane4 lane5 lane6 lane8 #ENSG00000124208 478 619 628 744 483 716 240 #ENSG00000182463 27 20 27 26 48 55 24 #ENSG00000124201 180 218 293 275 373 301 88 #ENSG00000124207 76 80 85 97 80 81 37 #ENSG00000125835 132 200 200 228 280 204 52 #ENSG00000125834 42 60 72 86 131 99 30 group <- c(rep("Control",4),rep("DHT",3)) d <- DGEList(counts = d, group=group) #Calculating library sizes from column totals. d <- calcNormFactors(d) d <- estimateCommonDisp(d) de.com <- exactTest(d) #Comparison of groups: DHT - Control summary(decideTestsDGE(de.com,p.value=0.05)) # [,1] #-1 2368 #0 11715 #1 2411 d <- estimateTagwiseDisp(d, prop.used=0.5, grid.length=500) de.tgw <- exactTest(d) summary(decideTestsDGE(de.tgw, p.value=0.05)) # [,1] #-1 2119 #0 12048 #1 2327 #total 4446 de <- summary(decideTestsDGE(de.tgw,p.value=0.05)) total <- de[1] + de[3] sink("edger.output") topTags(de.tgw,n=total) sink() sessionInfo() #R version 2.14.1 (2011-12-22) #Platform: x86_64-unknown-linux-gnu (64-bit) # #locale: #[1] C # #attached base packages: #[1] stats graphics grDevices utils datasets methods base # #other attached packages: #[1] edgeR_2.4.1 # #loaded via a namespace (and not attached): #[1] limma_3.10.0
For SAMSeq:
Data type? | Two class unpaired |
Array or Seq data? | seq |
Arrays centered? | FALSE |
Delta | 2.21025 |
Minimum fold change | 0 |
Test statistic | standard |
Are data are log scale? | FALSE |
Number of permutations | 1000 |
Number of neighbors for KNN | 10 |
Seed for Random number generator | 1234567 |
Estimate of pi0 (proportion of null genes) | 0.55 |
Exchangibility factor s0 | 0 |
s0 percentile | 0 |
Quantiles of estimated sequencing depths (0%, 25%, 50%, 75%, 100%) | 0.474, 0.822, 1.116, 1.228, 1.311 |
False Discovery Rate (%) | 3.228881 |
Of the 4,408 significantly differentiated genes detected by SAMSeq using the settings above, 3,908 overlap the 4,446 detected by edgeR using a tagwise dispersion.
See also: edgeR vs. DESeq using this same dataset.
and

This work is licensed under a Creative Commons
Attribution 4.0 International License.
Dear Dave,
I have asked some questions regarding samseq through email and you kindly help.
Just an issue about the samseq results:
This is the command I ran:
SAMseq(x = counts, y = group, resp.type = “Two class unpaired”,
nperms = 1000, random.seed = 1234567, fdr.output = 0.05)
as you see I have chose the FDR = 0.05 but I do not know why in the output I have something like this:
> samseqThre1[[1]][[4]]
$genes.up
Gene ID Gene Name Score(d) Fold Change q-value(%)
[1,] “g94” “94” “8” “1.626” “16.76”
[2,] “g1242” “1242” “8” “27.261” “16.76”
[3,] “g1295” “1295” “8” “1.511” “16.76”
[4,] “g1365” “1365” “8” “1.465” “16.76”
[5,] “g1487” “1487” “8” “2.815” “16.76”
….
….
as you can see the q value which is suppose to be the lowest FDR in samr is 16.76 % so it means that the lowest FDR regarding to these genes should be 0.1676 which is more than what I chose in my command(0.05). Do you have any hints or suggestions or maybe I am misunderstood.
Thank you in advance
Hi Fatemeh,
I think it would be best to email the original authors of this package. I’ve only used SAMSeq through Excel (very briefly) and not through R.
I found from this page http://www-stat.stanford.edu/~tibs/SAM/faq.html, that you should email sam-bug at stat.stanford.edu.
Hope that helps,
Dave
Hi Fatmeh,
Did you got an answer to this question? I was asking myself the same thing