From Dave's wiki
Jump to navigation Jump to search

The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.[1]


Using ANNOVAR -includeinfo -allsample -withfreq -format vcf4 ESP6500SI-V2-SSA137.all.vcf.gz > esp6500.avinput -geneanno -buildver hg19 esp6500.avinput humandb/

zcat esp6500.avinput.variant_function.gz | cut -f1 | sort | uniq -c | sort -k1rn
1179691 exonic
 675059 intronic
  42532 UTR3
  29608 UTR5
  23553 ncRNA_intronic
  20359 intergenic
  11363 ncRNA_exonic
   8119 splicing
   4180 upstream
   2322 downstream
    527 upstream;downstream
    241 exonic;splicing
     91 UTR5;UTR3
     68 ncRNA_splicing
     40 ncRNA_exonic;splicing
      1 ncRNA_UTR5

zcat esp6500.avinput.exonic_variant_function.gz | cut -f2 | sort | uniq -c | sort -k1rn
 695844 nonsynonymous SNV
 421924 synonymous SNV
  16741 stopgain
  15234 frameshift deletion
  10590 nonframeshift deletion
   9561 unknown
   7182 frameshift insertion
   2222 nonframeshift insertion
    634 stoploss


  1. NHLBI Go Exome Sequencing Project