Genomic Regions Enrichment of Annotations Tool

The Genomic Regions Enrichment of Annotations Tool (GREAT) is a tool that allows you to find enriched ontological terms in a set of genomic regions. This talk (running time ~1 hour) gives an overview of the tool. In brief, GREAT is an alternative to gene-centric enrichment tools such as DAVID and uses a binomial test to test for ontology enrichment. Figure 1b in the GREAT paper explains how GREAT models functional annotations in the genome. The advantage of using a binomial model, is that it takes into account the probability of having a genomic region overlap a region associated with a particular ontology, so that ontologies that are biased in terms of genome coverage are taken into account. GREAT incorporates annotations from 20 ontologies and is available as a web application. As stated in the paper, the utility of GREAT is not limited to just ChIP-seq data and for those who are more interested, check out their paper.

For this post, I will test GREAT by using genomic regions that have been validated as being enhancers. For the first test, I will examine human heart enhancers which I downloaded from the VISTA Enhancer Browser by using the Advanced Search and selected “heart”, “Positives”, and “Human”. The output is as a fasta file.

cat vista_heart_enhancer.fa | grep ">Human" | wc -l
#91

#parse the definition line
#and reformat into BED format
cat vista_heart_enhancer.fa | grep ">Human" | cut -f2 -d'|' | sed 's/\s+//g;s/[:-]/\t/g'
chr7    114295109       114296373
chr10   102546590       102548095
chr3    18169362        18170237
chr15   70391576        70392613
chr18   22864753        22866457
chr19   30767058        30768273
chr1    10781239        10781744
chr7    35505184        35506342
chr7    35458551        35459522
chr22   38394345        38395199
chr22   38429240        38430515
chr18   19773983        19774532
chr5    88548400        88550500
chr8    11604182        11604695
chr22   38381623        38382457
chr1    27049620        27050905
chr3    71034389        71035430
chr1    88108084        88109396
chr1    164668592       164669823
chr14   53833457        53836210
chr2    105300344       105301657
chr2    59178992        59180242
chr2    103538361       103539868
chr1    61917795        61920190
chr2    50840428        50844037
chr14   68634843        68636060
chr14   77384641        77387697
chr14   75724424        75726998
chr13   31364918        31367891
chr4    124775814       124779530
chr10   119725799       119727161
chr14   75703373        75705053
chr8    93030368        93033345
chr7    35412725        35416349
chr8    37683090        37685347
chr18   20049644        20052904
chr14   23906587        23908214
chr1    113540056       113542020
chr17   27994702        27996874
chr10   77793903        77797889
chr6    111871688       111876668
chr3    50636478        50640158
chr11   75264066        75266752
chr2    241197357       241200155
chr18   12277696        12281650
chr5    148801693       148805076
chr7    158888320       158891362
chr19   11251931        11254703
chr11   65254704        65258394
chr16   66937076        66939560
chr10   134442029       134446812
chr2    218801171       218803563
chr1    230984523       230987981
chr10   80980316        80984042
chr17   37831124        37834530
chr7    95236622        95240458
chr16   89057397        89061023
chr17   61751592        61755090
chr11   33964286        33967857
chr11   8226046 8230683
chr1    181121049       181123654
chr15   44171247        44173891
chr2    232527256       232532109
chr6    35457129        35460603
chr1    156630249       156635284
chr2    43444721        43447298
chr1    3262093 3266365
chr11   119759730       119763658
chr11   69308185        69312534
chr14   100039652       100044326
chr15   99257740        99262228
chr2    101726098       101729492
chr2    241307979       241312994
chr2    238522379       238526567
chr19   41936986        41938891
chr10   93348489        93351723
chr2    238221820       238225273
chr6    164382342       164386389
chr10   3087624 3090443
chr16   82767609        82770047
chr8    30135042        30138909
chr9    134110296       134115184
chr2    47295775        47299432
chr20   56014710        56019293
chr10   79972390        79977332
chr5    176076732       176078530
chr2    159885988       159889012
chr7    50564242        50565706
chr9    38038695        38040099
chr12   54412238        54415445
chr22   32019365        32023822

I pasted these regions into GREAT (selecting hg19 and “BED data” and without a background) and got these results:

assocCountHistoAll the heart enhancers could be associated with one or more genes.

20140513-public-2.0.2-ITpj4G-eaSymmetrical distribution in distance away from known TSSs in the set of heart enhancers.

20140513-public-2.0.2-ITpj4G-absval-eaThe majority of heart enhancers are at least 5 kb away from known TSSs.

enriched_biological_processSeveral enriched GO Biological Processes are clearly associated with heart function.

Let’s try with midbrain enhancers:

cat vista_midbrain_enhancer.fa | grep "^>" | wc -l
255

#you can download this file at
#https://dl.dropboxusercontent.com/u/15251811/vista_midbrain_enhancer.bed
cat vista_midbrain_enhancer.fa | grep ">Human" | cut -f2 -d'|' | sed 's/\s+//g;s/[:-]/\t/g' > vista_midbrain_enhancer.bed

assocCountHisto_midbrainMost midbrain enhancers could be associated with genes.

20140514-public-2.0.2-A2DpFs-ea

20140514-public-2.0.2-A2DpFs-absval-eaMost midbrain enhancers are at least 5kb away from known TSSs.

enriched_biological_process_midbrainCentral nervous system, forebrain, and brain development GO Biological Process seem relevant.

Programming Interface

The PHP script on the GREAT website generates the HTML output, which is then rendered on our web browser. The cool thing about using a Comman Gateway Interface (CGI) is that (usually) one can pass parameters to the CGI script and interact with it programmatically. The two required parameters of the GREAT PHP script, are requestURL and requestSpecies (see parameters more here). So to obtain the results of the midbrain enhancers seen above just paste in:

http://bejerano.stanford.edu/great/public/cgi-bin/greatStart.php?requestSpecies=hg19&requestURL=https://dl.dropboxusercontent.com/u/15251811/vista_midbrain_enhancer.bed

This programming interface allows other tools to interact with GREAT as long as the URL for the BED file and the genome assembly are provided. One such tool is the UCSC Table Browser, which can interact with GREAT via this interface. So you can send all your interesting table browser results to GREAT.

Conclusions

I heard about the Genomic Regions Enrichment of Annotations Tool (GREAT) a couple of years ago but I never got around to trying it out. Just yesterday I was reading a paper and saw how GREAT was used to associate ontology terms to genomic regions (in contrast to the approach of associating ontology terms to a gene list), which is something I’ve been trying to do. I work a lot with non-coding DNA regions that have transcriptional activity, but are largely intergenic, and I’ve been looking for a way to associate some function to these regions. I have used chromosomal interaction data to observe whether an intergenic region was in close proximity to a gene, or I would try to find co-expression patterns between my intergenic regions and genes. I will definitely add GREAT into my analysis pipeline.

The GREAT results for the enhancers look promising and I will definitely test it with my own data. I went through their help page but couldn’t find too much written about the appropriate background set to use. I checked out their demos and for the SRF and ultraconserved elements examples, they used the whole genome as their background. Something to look into in a future update of this post.

I also tried to compile the GREAT tool but ran into problems (looked like I was missing some C libraries that was needed by the Kent source).

In conclusion, if you work with a lot of non-coding DNA regions, check out GREAT.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
4 comments Add yours
  1. hi, i am very interested in GREAT, but i have met some problems, could you give me a hand? i just don’t known how to get detail information about the associate genes of my regions, such as the position information. could you be so kind to leave me some advises ? thanks a lot!

    1. I would recommend that you read the paper and email the authors of GREAT for help. I’m just a user of the software, like yourself.

  2. Hi DAVO,

    Thanks for the post. I would like to ask about your experience in using GREAT. Do you know if there is a way to/ have you ever use updated ontology files with GREAT? (e.g. in the local version http://great.stanford.edu/help/display/GREAT/Download)

    I also found this tool very useful, especially when working with cell types without available chromatin interaction datasets but very concern about the outdated ontology. Or do you know if there are other tools which can also predict ChIP peak associated genes & do subsequent ontology analysis using similar logic as GREAT?

    Thanks a lot!
    Kylie

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.