Download 5' UTR for all RefSeq genes using the UCSC Table Browser.
Separate features according to strand
cat hg19_refgene_five_utr_110914.bed | perl -nle '@a = split; print if $a[5] eq "+";' > hg19_refgene_five_utr_110914_plus.bed cat hg19_refgene_five_utr_110914.bed | perl -nle '@a = split; print if $a[5] eq "-";' > hg19_refgene_five_utr_110914_neg.bed
Use intersectBed to find overlapping features
#Force strandedness as a test, should have no output intersectBed -s -a hg19_refgene_five_utr_110914_neg.bed -b hg19_refgene_five_utr_110914_plus.bed intersectBed -wo -a hg19_refgene_five_utr_110914_neg.bed -b hg19_refgene_five_utr_110914_plus.bed > overlap cat overlap | perl -nle '@a = split; $t{$a[3]} = '1'; $t{$a[9]} = '1'; END {print join("\n",keys %t)};' | cut -f1,2 -d'_' > unique
Performing a GO enrichment analysis on the unique list of bidirectional genes and using all the genes as the universe list:
R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library("GO.db") > library("GOstats") > entrezUniverse=scan("universe2") > selectedEntrezIds=scan("entrez") > hgCutoff = 0.001 > #Biological Process > params = new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,ontology="BP",pvalueCutoff=hgCutoff,conditional=TRUE,testDirection=" over",annotation="org.Hs.eg.db") > hgOver=hyperGTest(params) > summary(hgOver) GOBPID Pvalue OddsRatio ExpCount Count Size 1 GO:0034641 2.845531e-06 1.675853 116.7780905 157 5140 2 GO:0090304 4.877145e-06 1.688666 90.8551719 128 3999 3 GO:0044260 5.050034e-05 1.551424 137.7708834 173 6064 4 GO:0022403 1.093607e-04 2.174322 16.4261789 33 723 5 GO:0016568 2.753699e-04 2.582012 8.3153271 20 366 6 GO:0006996 2.867515e-04 1.923650 22.5838527 40 1037 7 GO:0006281 3.059896e-04 2.631828 7.7473402 19 341 8 GO:0071841 3.514806e-04 1.571534 61.7969662 87 2720 9 GO:0044238 3.720664e-04 1.481915 184.6411559 215 8127 10 GO:0000075 3.912858e-04 3.099874 4.8619672 14 214 11 GO:0051329 4.542075e-04 2.618305 7.3611092 18 324 12 GO:0034622 4.721711e-04 2.353943 9.9965681 22 440 13 GO:0044248 4.733327e-04 1.759037 30.1941794 49 1329 14 GO:0006399 4.960020e-04 3.894081 2.7944952 10 123 15 GO:0006839 4.964491e-04 4.805971 1.8402773 8 81 16 GO:0010564 5.842741e-04 2.558462 7.5201455 18 331 17 GO:0007049 5.963847e-04 1.790438 26.5136248 44 1167 18 GO:0000387 6.138504e-04 8.383672 0.7043037 5 31 19 GO:0006368 7.234803e-04 5.192143 1.4994852 7 66 20 GO:0051276 7.455445e-04 2.080239 13.8588784 27 610 21 GO:0006974 8.086037e-04 3.376597 3.5085746 11 160 22 GO:0050434 8.996583e-04 5.955524 1.1359736 6 50 23 GO:0051028 9.584968e-04 3.873584 2.5218615 9 111 24 GO:0010467 9.885920e-04 1.455638 89.2420894 115 3928 Term 1 cellular nitrogen compound metabolic process 2 nucleic acid metabolic process 3 cellular macromolecule metabolic process 4 cell cycle phase 5 chromatin modification 6 organelle organization 7 DNA repair 8 cellular component organization or biogenesis at cellular level 9 primary metabolic process 10 cell cycle checkpoint 11 interphase of mitotic cell cycle 12 cellular macromolecular complex assembly 13 cellular catabolic process 14 tRNA metabolic process 15 mitochondrial transport 16 regulation of cell cycle process 17 cell cycle 18 spliceosomal snRNP assembly 19 transcription elongation from RNA polymerase II promoter 20 chromosome organization 21 response to DNA damage stimulus 22 positive regulation of viral transcription 23 mRNA transport 24 gene expression > #Molecular Function > params = new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,ontology="MF",pvalueCutoff=hgCutoff,conditional=TRUE,testDirection="over",annotation="org.Hs.eg.db") > hgOver=hyperGTest(params) > summary(hgOver) GOMFID Pvalue OddsRatio ExpCount Count Size 1 GO:0003676 0.0001312477 1.664829 53.28987613 79 2437 2 GO:0016206 0.0005217889 Inf 0.04574949 2 2 3 GO:0003723 0.0008795796 1.909603 18.43704529 33 806 4 GO:0050662 0.0009527279 3.105508 4.14032903 12 181 Term 1 nucleic acid binding 2 catechol O-methyltransferase activity 3 RNA binding 4 coenzyme binding > #Cellular component > params = new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,ontology="CC",pvalueCutoff=hgCutoff,conditional=TRUE,testDirection="over",annotation="org.Hs.eg.db") > hgOver=hyperGTest(params) > summary(hgOver) GOCCID Pvalue OddsRatio ExpCount Count Size 1 GO:0043227 8.095989e-15 2.357756 193.81366272 266 8649 2 GO:0043229 1.078643e-14 2.445344 215.07960860 285 9598 3 GO:0005622 1.268144e-13 2.727612 259.96442377 320 11601 4 GO:0031974 7.622271e-13 2.476594 50.44219618 102 2251 5 GO:0070013 1.297462e-12 2.479110 48.64949263 99 2171 6 GO:0005634 9.443707e-12 2.053406 120.73858420 183 5388 7 GO:0044422 3.286750e-11 2.013652 121.41084803 182 5418 8 GO:0005654 6.334903e-08 2.336166 28.48157768 59 1271 9 GO:0043228 4.001227e-06 1.784747 58.62140614 92 2616 10 GO:0005730 1.096997e-05 2.682435 11.33884996 28 506 11 GO:0005694 1.166634e-05 2.622132 12.01111380 29 536 12 GO:0043234 5.725214e-05 1.662474 59.30486691 88 2712 13 GO:0000151 6.952506e-05 4.226438 3.11482242 12 139 14 GO:0005739 8.158783e-05 1.881356 29.46756463 51 1315 15 GO:0005737 1.050799e-04 1.488375 182.27313361 218 8134 16 GO:0015630 3.948281e-04 2.107557 14.67776033 29 655 17 GO:0000775 4.285894e-04 3.660054 3.24927519 11 145 18 GO:0005684 5.008308e-04 Inf 0.04481759 2 2 19 GO:0080008 7.364930e-04 11.749319 0.42576709 4 19 Term 1 membrane-bounded organelle 2 intracellular organelle 3 intracellular 4 membrane-enclosed lumen 5 intracellular organelle lumen 6 nucleus 7 organelle part 8 nucleoplasm 9 non-membrane-bounded organelle 10 nucleolus 11 chromosome 12 protein complex 13 ubiquitin ligase complex 14 mitochondrion 15 cytoplasm 16 microtubule cytoskeleton 17 chromosome, centromeric region 18 U2-type spliceosomal complex 19 CUL4 RING ubiquitin ligase complex > #KEGG > params = new("KEGGHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,pvalueCutoff=hgCutoff,testDirection="over",annotation="org.Hs.eg.db") > hgOver=hyperGTest(params) > summary(hgOver) KEGGID Pvalue OddsRatio ExpCount Count Size Term 1 03013 0.0003025115 3.918123 3.174733 11 152 RNA transport >
Although this was a brief analysis, the results are somewhat similar to the findings in the paper Trinklein et al., 2004 (An Abundance of Bidirectional Promoters in the Human Genome). Findings from the paper include:
1. DNA-repair genes are more than fivefold overrepresented in the bidirectional class.
2. Chaperone proteins are almost threefold overrepresented
3. Mitochondrial genes are more than twofold overrepresented

This work is licensed under a Creative Commons
Attribution 4.0 International License.