The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.[1]


Using ANNOVAR -includeinfo -allsample -withfreq -format vcf4 ESP6500SI-V2-SSA137.all.vcf.gz > esp6500.avinput -geneanno -buildver hg19 esp6500.avinput humandb/

zcat esp6500.avinput.variant_function.gz | cut -f1 | sort | uniq -c | sort -k1rn
1179691 exonic
 675059 intronic
  42532 UTR3
  29608 UTR5
  23553 ncRNA_intronic
  20359 intergenic
  11363 ncRNA_exonic
   8119 splicing
   4180 upstream
   2322 downstream
    527 upstream;downstream
    241 exonic;splicing
     91 UTR5;UTR3
     68 ncRNA_splicing
     40 ncRNA_exonic;splicing
      1 ncRNA_UTR5

zcat esp6500.avinput.exonic_variant_function.gz | cut -f2 | sort | uniq -c | sort -k1rn
 695844 nonsynonymous SNV
 421924 synonymous SNV
  16741 stopgain
  15234 frameshift deletion
  10590 nonframeshift deletion
   9561 unknown
   7182 frameshift insertion
   2222 nonframeshift insertion
    634 stoploss


