A while ago I received a comment on comparing parametric methods against nonparametric for calling differential expression in count data. Here I compare SAMSeq (Jun Li and Robert Tibshirani (2011) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, in press.) with edgeR. For more information on SAMSeq refer to the paper. I intend to look at cuffdiff sometime later.
library(edgeR)
raw.data <- read.delim("pnas_expression_filtered.tsv",row.names=1)
d <- raw.data[,1:7]
rownames(d) <- rownames(raw.data)
head(d)
# lane1 lane2 lane3 lane4 lane5 lane6 lane8
#ENSG00000124208 478 619 628 744 483 716 240
#ENSG00000182463 27 20 27 26 48 55 24
#ENSG00000124201 180 218 293 275 373 301 88
#ENSG00000124207 76 80 85 97 80 81 37
#ENSG00000125835 132 200 200 228 280 204 52
#ENSG00000125834 42 60 72 86 131 99 30
group <- c(rep("Control",4),rep("DHT",3))
d <- DGEList(counts = d, group=group)
#Calculating library sizes from column totals.
d <- calcNormFactors(d)
d <- estimateCommonDisp(d)
de.com <- exactTest(d)
#Comparison of groups: DHT - Control
summary(decideTestsDGE(de.com,p.value=0.05))
# [,1]
#-1 2368
#0 11715
#1 2411
d <- estimateTagwiseDisp(d, prop.used=0.5, grid.length=500)
de.tgw <- exactTest(d)
summary(decideTestsDGE(de.tgw, p.value=0.05))
# [,1]
#-1 2119
#0 12048
#1 2327
#total 4446
de <- summary(decideTestsDGE(de.tgw,p.value=0.05))
total <- de[1] + de[3]
sink("edger.output")
topTags(de.tgw,n=total)
sink()
sessionInfo()
#R version 2.14.1 (2011-12-22)
#Platform: x86_64-unknown-linux-gnu (64-bit)
#
#locale:
#[1] C
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#
#other attached packages:
#[1] edgeR_2.4.1
#
#loaded via a namespace (and not attached):
#[1] limma_3.10.0
For SAMSeq:
| Data type? | Two class unpaired |
| Array or Seq data? | seq |
| Arrays centered? | FALSE |
| Delta | 2.21025 |
| Minimum fold change | 0 |
| Test statistic | standard |
| Are data are log scale? | FALSE |
| Number of permutations | 1000 |
| Number of neighbors for KNN | 10 |
| Seed for Random number generator | 1234567 |
| Estimate of pi0 (proportion of null genes) | 0.55 |
| Exchangibility factor s0 | 0 |
| s0 percentile | 0 |
| Quantiles of estimated sequencing depths (0%, 25%, 50%, 75%, 100%) | 0.474, 0.822, 1.116, 1.228, 1.311 |
| False Discovery Rate (%) | 3.228881 |
Of the 4,408 significantly differentiated genes detected by SAMSeq using the settings above, 3,908 overlap the 4,446 detected by edgeR using a tagwise dispersion.
See also: edgeR vs. DESeq using this same dataset.
and

This work is licensed under a Creative Commons
Attribution 4.0 International License.
Dear Dave,
I have asked some questions regarding samseq through email and you kindly help.
Just an issue about the samseq results:
This is the command I ran:
SAMseq(x = counts, y = group, resp.type = “Two class unpaired”,
nperms = 1000, random.seed = 1234567, fdr.output = 0.05)
as you see I have chose the FDR = 0.05 but I do not know why in the output I have something like this:
> samseqThre1[[1]][[4]]
$genes.up
Gene ID Gene Name Score(d) Fold Change q-value(%)
[1,] “g94” “94” “8” “1.626” “16.76”
[2,] “g1242” “1242” “8” “27.261” “16.76”
[3,] “g1295” “1295” “8” “1.511” “16.76”
[4,] “g1365” “1365” “8” “1.465” “16.76”
[5,] “g1487” “1487” “8” “2.815” “16.76”
….
….
as you can see the q value which is suppose to be the lowest FDR in samr is 16.76 % so it means that the lowest FDR regarding to these genes should be 0.1676 which is more than what I chose in my command(0.05). Do you have any hints or suggestions or maybe I am misunderstood.
Thank you in advance
Hi Fatemeh,
I think it would be best to email the original authors of this package. I’ve only used SAMSeq through Excel (very briefly) and not through R.
I found from this page http://www-stat.stanford.edu/~tibs/SAM/faq.html, that you should email sam-bug at stat.stanford.edu.
Hope that helps,
Dave
Hi Fatmeh,
Did you got an answer to this question? I was asking myself the same thing