The Genomic Regions Enrichment of Annotations Tool (GREAT) is a tool that allows you to find enriched ontological terms in a set of genomic regions. This talk (running time ~1 hour) gives an overview of the tool. In brief, GREAT is an alternative to gene-centric enrichment tools such as DAVID and uses a binomial test to test for ontology enrichment. Figure 1b in the GREAT paper explains how GREAT models functional annotations in the genome. The advantage of using a binomial model, is that it takes into account the probability of having a genomic region overlap a region associated with a particular ontology, so that ontologies that are biased in terms of genome coverage are taken into account. GREAT incorporates annotations from 20 ontologies and is available as a web application. As stated in the paper, the utility of GREAT is not limited to just ChIP-seq data and for those who are more interested, check out their paper.
For this post, I will test GREAT by using genomic regions that have been validated as being enhancers. For the first test, I will examine human heart enhancers which I downloaded from the VISTA Enhancer Browser by using the Advanced Search and selected "heart", "Positives", and "Human". The output is as a fasta file.
cat vista_heart_enhancer.fa | grep ">Human" | wc -l #91 #parse the definition line #and reformat into BED format cat vista_heart_enhancer.fa | grep ">Human" | cut -f2 -d'|' | sed 's/\s+//g;s/[:-]/\t/g' chr7 114295109 114296373 chr10 102546590 102548095 chr3 18169362 18170237 chr15 70391576 70392613 chr18 22864753 22866457 chr19 30767058 30768273 chr1 10781239 10781744 chr7 35505184 35506342 chr7 35458551 35459522 chr22 38394345 38395199 chr22 38429240 38430515 chr18 19773983 19774532 chr5 88548400 88550500 chr8 11604182 11604695 chr22 38381623 38382457 chr1 27049620 27050905 chr3 71034389 71035430 chr1 88108084 88109396 chr1 164668592 164669823 chr14 53833457 53836210 chr2 105300344 105301657 chr2 59178992 59180242 chr2 103538361 103539868 chr1 61917795 61920190 chr2 50840428 50844037 chr14 68634843 68636060 chr14 77384641 77387697 chr14 75724424 75726998 chr13 31364918 31367891 chr4 124775814 124779530 chr10 119725799 119727161 chr14 75703373 75705053 chr8 93030368 93033345 chr7 35412725 35416349 chr8 37683090 37685347 chr18 20049644 20052904 chr14 23906587 23908214 chr1 113540056 113542020 chr17 27994702 27996874 chr10 77793903 77797889 chr6 111871688 111876668 chr3 50636478 50640158 chr11 75264066 75266752 chr2 241197357 241200155 chr18 12277696 12281650 chr5 148801693 148805076 chr7 158888320 158891362 chr19 11251931 11254703 chr11 65254704 65258394 chr16 66937076 66939560 chr10 134442029 134446812 chr2 218801171 218803563 chr1 230984523 230987981 chr10 80980316 80984042 chr17 37831124 37834530 chr7 95236622 95240458 chr16 89057397 89061023 chr17 61751592 61755090 chr11 33964286 33967857 chr11 8226046 8230683 chr1 181121049 181123654 chr15 44171247 44173891 chr2 232527256 232532109 chr6 35457129 35460603 chr1 156630249 156635284 chr2 43444721 43447298 chr1 3262093 3266365 chr11 119759730 119763658 chr11 69308185 69312534 chr14 100039652 100044326 chr15 99257740 99262228 chr2 101726098 101729492 chr2 241307979 241312994 chr2 238522379 238526567 chr19 41936986 41938891 chr10 93348489 93351723 chr2 238221820 238225273 chr6 164382342 164386389 chr10 3087624 3090443 chr16 82767609 82770047 chr8 30135042 30138909 chr9 134110296 134115184 chr2 47295775 47299432 chr20 56014710 56019293 chr10 79972390 79977332 chr5 176076732 176078530 chr2 159885988 159889012 chr7 50564242 50565706 chr9 38038695 38040099 chr12 54412238 54415445 chr22 32019365 32023822
I pasted these regions into GREAT (selecting hg19 and "BED data" and without a background) and got these results:
All the heart enhancers could be associated with one or more genes.
Symmetrical distribution in distance away from known TSSs in the set of heart enhancers.
The majority of heart enhancers are at least 5 kb away from known TSSs.
Several enriched GO Biological Processes are clearly associated with heart function.
Let's try with midbrain enhancers:
cat vista_midbrain_enhancer.fa | grep "^>" | wc -l 255 #you can download this file at #https://dl.dropboxusercontent.com/u/15251811/vista_midbrain_enhancer.bed cat vista_midbrain_enhancer.fa | grep ">Human" | cut -f2 -d'|' | sed 's/\s+//g;s/[:-]/\t/g' > vista_midbrain_enhancer.bed
Most midbrain enhancers could be associated with genes.
Most midbrain enhancers are at least 5kb away from known TSSs.
Central nervous system, forebrain, and brain development GO Biological Process seem relevant.
Programming Interface
The PHP script on the GREAT website generates the HTML output, which is then rendered on our web browser. The cool thing about using a Comman Gateway Interface (CGI) is that (usually) one can pass parameters to the CGI script and interact with it programmatically. The two required parameters of the GREAT PHP script, are requestURL and requestSpecies (see parameters more here). So to obtain the results of the midbrain enhancers seen above just paste in:
This programming interface allows other tools to interact with GREAT as long as the URL for the BED file and the genome assembly are provided. One such tool is the UCSC Table Browser, which can interact with GREAT via this interface. So you can send all your interesting table browser results to GREAT.
Conclusions
I heard about the Genomic Regions Enrichment of Annotations Tool (GREAT) a couple of years ago but I never got around to trying it out. Just yesterday I was reading a paper and saw how GREAT was used to associate ontology terms to genomic regions (in contrast to the approach of associating ontology terms to a gene list), which is something I've been trying to do. I work a lot with non-coding DNA regions that have transcriptional activity, but are largely intergenic, and I've been looking for a way to associate some function to these regions. I have used chromosomal interaction data to observe whether an intergenic region was in close proximity to a gene, or I would try to find co-expression patterns between my intergenic regions and genes. I will definitely add GREAT into my analysis pipeline.
The GREAT results for the enhancers look promising and I will definitely test it with my own data. I went through their help page but couldn't find too much written about the appropriate background set to use. I checked out their demos and for the SRF and ultraconserved elements examples, they used the whole genome as their background. Something to look into in a future update of this post.
I also tried to compile the GREAT tool but ran into problems (looked like I was missing some C libraries that was needed by the Kent source).
In conclusion, if you work with a lot of non-coding DNA regions, check out GREAT.

This work is licensed under a Creative Commons
Attribution 4.0 International License.

hi, i am very interested in GREAT, but i have met some problems, could you give me a hand? i just don’t known how to get detail information about the associate genes of my regions, such as the position information. could you be so kind to leave me some advises ? thanks a lot!
I would recommend that you read the paper and email the authors of GREAT for help. I’m just a user of the software, like yourself.
Hi DAVO,
Thanks for the post. I would like to ask about your experience in using GREAT. Do you know if there is a way to/ have you ever use updated ontology files with GREAT? (e.g. in the local version http://great.stanford.edu/help/display/GREAT/Download)
I also found this tool very useful, especially when working with cell types without available chromatin interaction datasets but very concern about the outdated ontology. Or do you know if there are other tools which can also predict ChIP peak associated genes & do subsequent ontology analysis using similar logic as GREAT?
Thanks a lot!
Kylie
Hi Kylie,
I have not used the local version of the tool so I can’t advise you on that.
I know of http://bioconductor.org/packages/release/bioc/html/ChIPpeakAnno.html but have not used it either (either though I’ve been meaning to test it out). Perhaps it’s an alternative to GREAT.
Good luck!
Dave