Download 5' UTR for all RefSeq genes using the UCSC Table Browser.
Separate features according to strand
cat hg19_refgene_five_utr_110914.bed | perl -nle '@a = split; print if $a[5] eq "+";' > hg19_refgene_five_utr_110914_plus.bed cat hg19_refgene_five_utr_110914.bed | perl -nle '@a = split; print if $a[5] eq "-";' > hg19_refgene_five_utr_110914_neg.bed
Use intersectBed to find overlapping features
#Force strandedness as a test, should have no output
intersectBed -s -a hg19_refgene_five_utr_110914_neg.bed -b hg19_refgene_five_utr_110914_plus.bed
intersectBed -wo -a hg19_refgene_five_utr_110914_neg.bed -b hg19_refgene_five_utr_110914_plus.bed > overlap
cat overlap | perl -nle '@a = split; $t{$a[3]} = '1'; $t{$a[9]} = '1'; END {print join("\n",keys %t)};' | cut -f1,2 -d'_' > unique
Performing a GO enrichment analysis on the unique list of bidirectional genes and using all the genes as the universe list:
R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library("GO.db")
> library("GOstats")
> entrezUniverse=scan("universe2")
> selectedEntrezIds=scan("entrez")
> hgCutoff = 0.001
> #Biological Process
> params = new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,ontology="BP",pvalueCutoff=hgCutoff,conditional=TRUE,testDirection="
over",annotation="org.Hs.eg.db")
> hgOver=hyperGTest(params)
> summary(hgOver)
GOBPID Pvalue OddsRatio ExpCount Count Size
1 GO:0034641 2.845531e-06 1.675853 116.7780905 157 5140
2 GO:0090304 4.877145e-06 1.688666 90.8551719 128 3999
3 GO:0044260 5.050034e-05 1.551424 137.7708834 173 6064
4 GO:0022403 1.093607e-04 2.174322 16.4261789 33 723
5 GO:0016568 2.753699e-04 2.582012 8.3153271 20 366
6 GO:0006996 2.867515e-04 1.923650 22.5838527 40 1037
7 GO:0006281 3.059896e-04 2.631828 7.7473402 19 341
8 GO:0071841 3.514806e-04 1.571534 61.7969662 87 2720
9 GO:0044238 3.720664e-04 1.481915 184.6411559 215 8127
10 GO:0000075 3.912858e-04 3.099874 4.8619672 14 214
11 GO:0051329 4.542075e-04 2.618305 7.3611092 18 324
12 GO:0034622 4.721711e-04 2.353943 9.9965681 22 440
13 GO:0044248 4.733327e-04 1.759037 30.1941794 49 1329
14 GO:0006399 4.960020e-04 3.894081 2.7944952 10 123
15 GO:0006839 4.964491e-04 4.805971 1.8402773 8 81
16 GO:0010564 5.842741e-04 2.558462 7.5201455 18 331
17 GO:0007049 5.963847e-04 1.790438 26.5136248 44 1167
18 GO:0000387 6.138504e-04 8.383672 0.7043037 5 31
19 GO:0006368 7.234803e-04 5.192143 1.4994852 7 66
20 GO:0051276 7.455445e-04 2.080239 13.8588784 27 610
21 GO:0006974 8.086037e-04 3.376597 3.5085746 11 160
22 GO:0050434 8.996583e-04 5.955524 1.1359736 6 50
23 GO:0051028 9.584968e-04 3.873584 2.5218615 9 111
24 GO:0010467 9.885920e-04 1.455638 89.2420894 115 3928
Term
1 cellular nitrogen compound metabolic process
2 nucleic acid metabolic process
3 cellular macromolecule metabolic process
4 cell cycle phase
5 chromatin modification
6 organelle organization
7 DNA repair
8 cellular component organization or biogenesis at cellular level
9 primary metabolic process
10 cell cycle checkpoint
11 interphase of mitotic cell cycle
12 cellular macromolecular complex assembly
13 cellular catabolic process
14 tRNA metabolic process
15 mitochondrial transport
16 regulation of cell cycle process
17 cell cycle
18 spliceosomal snRNP assembly
19 transcription elongation from RNA polymerase II promoter
20 chromosome organization
21 response to DNA damage stimulus
22 positive regulation of viral transcription
23 mRNA transport
24 gene expression
> #Molecular Function
> params = new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,ontology="MF",pvalueCutoff=hgCutoff,conditional=TRUE,testDirection="over",annotation="org.Hs.eg.db")
> hgOver=hyperGTest(params)
> summary(hgOver)
GOMFID Pvalue OddsRatio ExpCount Count Size
1 GO:0003676 0.0001312477 1.664829 53.28987613 79 2437
2 GO:0016206 0.0005217889 Inf 0.04574949 2 2
3 GO:0003723 0.0008795796 1.909603 18.43704529 33 806
4 GO:0050662 0.0009527279 3.105508 4.14032903 12 181
Term
1 nucleic acid binding
2 catechol O-methyltransferase activity
3 RNA binding
4 coenzyme binding
> #Cellular component
> params = new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,ontology="CC",pvalueCutoff=hgCutoff,conditional=TRUE,testDirection="over",annotation="org.Hs.eg.db")
> hgOver=hyperGTest(params)
> summary(hgOver)
GOCCID Pvalue OddsRatio ExpCount Count Size
1 GO:0043227 8.095989e-15 2.357756 193.81366272 266 8649
2 GO:0043229 1.078643e-14 2.445344 215.07960860 285 9598
3 GO:0005622 1.268144e-13 2.727612 259.96442377 320 11601
4 GO:0031974 7.622271e-13 2.476594 50.44219618 102 2251
5 GO:0070013 1.297462e-12 2.479110 48.64949263 99 2171
6 GO:0005634 9.443707e-12 2.053406 120.73858420 183 5388
7 GO:0044422 3.286750e-11 2.013652 121.41084803 182 5418
8 GO:0005654 6.334903e-08 2.336166 28.48157768 59 1271
9 GO:0043228 4.001227e-06 1.784747 58.62140614 92 2616
10 GO:0005730 1.096997e-05 2.682435 11.33884996 28 506
11 GO:0005694 1.166634e-05 2.622132 12.01111380 29 536
12 GO:0043234 5.725214e-05 1.662474 59.30486691 88 2712
13 GO:0000151 6.952506e-05 4.226438 3.11482242 12 139
14 GO:0005739 8.158783e-05 1.881356 29.46756463 51 1315
15 GO:0005737 1.050799e-04 1.488375 182.27313361 218 8134
16 GO:0015630 3.948281e-04 2.107557 14.67776033 29 655
17 GO:0000775 4.285894e-04 3.660054 3.24927519 11 145
18 GO:0005684 5.008308e-04 Inf 0.04481759 2 2
19 GO:0080008 7.364930e-04 11.749319 0.42576709 4 19
Term
1 membrane-bounded organelle
2 intracellular organelle
3 intracellular
4 membrane-enclosed lumen
5 intracellular organelle lumen
6 nucleus
7 organelle part
8 nucleoplasm
9 non-membrane-bounded organelle
10 nucleolus
11 chromosome
12 protein complex
13 ubiquitin ligase complex
14 mitochondrion
15 cytoplasm
16 microtubule cytoskeleton
17 chromosome, centromeric region
18 U2-type spliceosomal complex
19 CUL4 RING ubiquitin ligase complex
> #KEGG
> params = new("KEGGHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,pvalueCutoff=hgCutoff,testDirection="over",annotation="org.Hs.eg.db")
> hgOver=hyperGTest(params)
> summary(hgOver)
KEGGID Pvalue OddsRatio ExpCount Count Size Term
1 03013 0.0003025115 3.918123 3.174733 11 152 RNA transport
>
Although this was a brief analysis, the results are somewhat similar to the findings in the paper Trinklein et al., 2004 (An Abundance of Bidirectional Promoters in the Human Genome). Findings from the paper include:
1. DNA-repair genes are more than fivefold overrepresented in the bidirectional class.
2. Chaperone proteins are almost threefold overrepresented
3. Mitochondrial genes are more than twofold overrepresented

This work is licensed under a Creative Commons
Attribution 4.0 International License.