1000 Genomes Project

From Dave's wiki
Jump to navigation Jump to search




Structural variation, also known as genomic structural variation, is the variation in structure of an organism's chromosome.

Copy-number variation (CNV) is a large category of structural variation, which includes insertions, deletions and duplications.

Directional or positive selection refers to a situation where a certain allele has a greater fitness than other alleles, consequently increasing its population frequency.

Stabilising or negative selection (also known as purifying selection) lowers the frequency or even removes alleles from a population due to disadvantages associated with it with respect to other alleles.

Finally, a number of forms of balancing selection exist; those increase genetic variation within a species by being overdominant (heterozygous individuals are fitter than homozygous individuals, e.g. G6PD, a gene that is involved in both sickle cell anaemia and malaria resistance) or can vary spatially within a species that inhabits different niches, thus favouring different alleles. Some genomic differences may not affect fitness. Neutral variation, previously thought to be “junk” DNA, is unaffected by natural selection resulting in higher genetic variation at such sites when compared to sites where variation does influence fitness.


The primary goal of this project is to create a complete and detailed catalogue of human genetic variations, which in turn can be used for association studies relating genetic variation to disease. By doing so the consortium aims to discover >95 % of the variants (e.g. SNPs, CNVs, indels) with minor allele frequencies as low as 1% across the genome and 0.1-0.5% in gene regions, as well as to estimate the population frequencies, haplotype backgrounds and linkage disequilibrium patterns of variant alleles. Secondary goals will include the support of better SNP and probe selection for genotyping platforms in future studies and the improvement of the human reference sequence. Furthermore, the completed database will be a useful tool for studying regions under selection, variation in multiple populations and understanding the underlying processes of mutation and recombination.


A map of human genome variation from population scale sequencing http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042601/

The pilot phase consisted of three projects:

  • low-coverage whole-genome sequencing of 179 individuals from 4 populations
  • high-coverage sequencing of 2 trios (mother-father-child)
  • exon-targeted sequencing of 697 individuals from 7 populations

It was found that on average, each person carries around 250-300 loss-of-function variants in annotated genes and 50-100 variants previously implicated in inherited disorders. Based on the two trios, it is estimated that the rate of de novo germline mutation is approximately 10-8 per base per generation.

An integrated map of genetic variation from 1,092 human genomes http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3498066/