Bidirectional genes

Download 5' UTR for all RefSeq genes using the UCSC Table Browser.

Separate features according to strand

cat hg19_refgene_five_utr_110914.bed | perl -nle '@a = split; print if $a[5] eq "+";' > hg19_refgene_five_utr_110914_plus.bed
cat hg19_refgene_five_utr_110914.bed | perl -nle '@a = split; print if $a[5] eq "-";' > hg19_refgene_five_utr_110914_neg.bed

Use intersectBed to find overlapping features

#Force strandedness as a test, should have no output
intersectBed -s -a hg19_refgene_five_utr_110914_neg.bed -b hg19_refgene_five_utr_110914_plus.bed
intersectBed -wo -a hg19_refgene_five_utr_110914_neg.bed -b hg19_refgene_five_utr_110914_plus.bed > overlap
cat overlap | perl -nle '@a = split; $t{$a[3]} = '1'; $t{$a[9]} = '1'; END {print join("\n",keys %t)};' | cut -f1,2 -d'_' > unique

Performing a GO enrichment analysis on the unique list of bidirectional genes and using all the genes as the universe list:

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library("GO.db")
> library("GOstats")
> entrezUniverse=scan("universe2")
> selectedEntrezIds=scan("entrez")
> hgCutoff = 0.001
> #Biological Process
> params = new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,ontology="BP",pvalueCutoff=hgCutoff,conditional=TRUE,testDirection="
over",annotation="org.Hs.eg.db")
> hgOver=hyperGTest(params)
> summary(hgOver)
       GOBPID       Pvalue OddsRatio    ExpCount Count Size
1  GO:0034641 2.845531e-06  1.675853 116.7780905   157 5140
2  GO:0090304 4.877145e-06  1.688666  90.8551719   128 3999
3  GO:0044260 5.050034e-05  1.551424 137.7708834   173 6064
4  GO:0022403 1.093607e-04  2.174322  16.4261789    33  723
5  GO:0016568 2.753699e-04  2.582012   8.3153271    20  366
6  GO:0006996 2.867515e-04  1.923650  22.5838527    40 1037
7  GO:0006281 3.059896e-04  2.631828   7.7473402    19  341
8  GO:0071841 3.514806e-04  1.571534  61.7969662    87 2720
9  GO:0044238 3.720664e-04  1.481915 184.6411559   215 8127
10 GO:0000075 3.912858e-04  3.099874   4.8619672    14  214
11 GO:0051329 4.542075e-04  2.618305   7.3611092    18  324
12 GO:0034622 4.721711e-04  2.353943   9.9965681    22  440
13 GO:0044248 4.733327e-04  1.759037  30.1941794    49 1329
14 GO:0006399 4.960020e-04  3.894081   2.7944952    10  123
15 GO:0006839 4.964491e-04  4.805971   1.8402773     8   81
16 GO:0010564 5.842741e-04  2.558462   7.5201455    18  331
17 GO:0007049 5.963847e-04  1.790438  26.5136248    44 1167
18 GO:0000387 6.138504e-04  8.383672   0.7043037     5   31
19 GO:0006368 7.234803e-04  5.192143   1.4994852     7   66
20 GO:0051276 7.455445e-04  2.080239  13.8588784    27  610
21 GO:0006974 8.086037e-04  3.376597   3.5085746    11  160
22 GO:0050434 8.996583e-04  5.955524   1.1359736     6   50
23 GO:0051028 9.584968e-04  3.873584   2.5218615     9  111
24 GO:0010467 9.885920e-04  1.455638  89.2420894   115 3928
                                                              Term
1                     cellular nitrogen compound metabolic process
2                                   nucleic acid metabolic process
3                         cellular macromolecule metabolic process
4                                                 cell cycle phase
5                                           chromatin modification
6                                           organelle organization
7                                                       DNA repair
8  cellular component organization or biogenesis at cellular level
9                                        primary metabolic process
10                                           cell cycle checkpoint
11                                interphase of mitotic cell cycle
12                        cellular macromolecular complex assembly
13                                      cellular catabolic process
14                                          tRNA metabolic process
15                                         mitochondrial transport
16                                regulation of cell cycle process
17                                                      cell cycle
18                                     spliceosomal snRNP assembly
19        transcription elongation from RNA polymerase II promoter
20                                         chromosome organization
21                                 response to DNA damage stimulus
22                      positive regulation of viral transcription
23                                                  mRNA transport
24                                                 gene expression
> #Molecular Function
> params = new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,ontology="MF",pvalueCutoff=hgCutoff,conditional=TRUE,testDirection="over",annotation="org.Hs.eg.db")
> hgOver=hyperGTest(params)
> summary(hgOver)
      GOMFID       Pvalue OddsRatio    ExpCount Count Size
1 GO:0003676 0.0001312477  1.664829 53.28987613    79 2437
2 GO:0016206 0.0005217889       Inf  0.04574949     2    2
3 GO:0003723 0.0008795796  1.909603 18.43704529    33  806
4 GO:0050662 0.0009527279  3.105508  4.14032903    12  181
                                   Term
1                  nucleic acid binding
2 catechol O-methyltransferase activity
3                           RNA binding
4                      coenzyme binding
> #Cellular component
> params = new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,ontology="CC",pvalueCutoff=hgCutoff,conditional=TRUE,testDirection="over",annotation="org.Hs.eg.db")
> hgOver=hyperGTest(params)
> summary(hgOver)
       GOCCID       Pvalue OddsRatio     ExpCount Count  Size
1  GO:0043227 8.095989e-15  2.357756 193.81366272   266  8649
2  GO:0043229 1.078643e-14  2.445344 215.07960860   285  9598
3  GO:0005622 1.268144e-13  2.727612 259.96442377   320 11601
4  GO:0031974 7.622271e-13  2.476594  50.44219618   102  2251
5  GO:0070013 1.297462e-12  2.479110  48.64949263    99  2171
6  GO:0005634 9.443707e-12  2.053406 120.73858420   183  5388
7  GO:0044422 3.286750e-11  2.013652 121.41084803   182  5418
8  GO:0005654 6.334903e-08  2.336166  28.48157768    59  1271
9  GO:0043228 4.001227e-06  1.784747  58.62140614    92  2616
10 GO:0005730 1.096997e-05  2.682435  11.33884996    28   506
11 GO:0005694 1.166634e-05  2.622132  12.01111380    29   536
12 GO:0043234 5.725214e-05  1.662474  59.30486691    88  2712
13 GO:0000151 6.952506e-05  4.226438   3.11482242    12   139
14 GO:0005739 8.158783e-05  1.881356  29.46756463    51  1315
15 GO:0005737 1.050799e-04  1.488375 182.27313361   218  8134
16 GO:0015630 3.948281e-04  2.107557  14.67776033    29   655
17 GO:0000775 4.285894e-04  3.660054   3.24927519    11   145
18 GO:0005684 5.008308e-04       Inf   0.04481759     2     2
19 GO:0080008 7.364930e-04 11.749319   0.42576709     4    19
                                 Term
1          membrane-bounded organelle
2             intracellular organelle
3                       intracellular
4             membrane-enclosed lumen
5       intracellular organelle lumen
6                             nucleus
7                      organelle part
8                         nucleoplasm
9      non-membrane-bounded organelle
10                          nucleolus
11                         chromosome
12                    protein complex
13           ubiquitin ligase complex
14                      mitochondrion
15                          cytoplasm
16           microtubule cytoskeleton
17     chromosome, centromeric region
18       U2-type spliceosomal complex
19 CUL4 RING ubiquitin ligase complex
> #KEGG
> params = new("KEGGHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,pvalueCutoff=hgCutoff,testDirection="over",annotation="org.Hs.eg.db")
> hgOver=hyperGTest(params)
> summary(hgOver)
  KEGGID       Pvalue OddsRatio ExpCount Count Size          Term
1  03013 0.0003025115  3.918123 3.174733    11  152 RNA transport
>

Although this was a brief analysis, the results are somewhat similar to the findings in the paper Trinklein et al., 2004 (An Abundance of Bidirectional Promoters in the Human Genome). Findings from the paper include:

1. DNA-repair genes are more than fivefold overrepresented in the bidirectional class.
2. Chaperone proteins are almost threefold overrepresented
3. Mitochondrial genes are more than twofold overrepresented




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.