Genetics

From Dave's wiki
Jump to navigation Jump to search

Our modern understanding of how traits may be inherited through generations comes from the principles proposed by Gregor Mendel in 1865.

Principles of inheritance

Mendel proposed his first principle, the principle of uniformity; this principle states that all the progeny of a cross, where the parents differ by only one trait, will appear identical due to a dominant trait.

His second principle of inheritance is the principle of segregation; this principle describes how pairs of gene variants are separated into reproductive cells. According to this principle, the "particles" (or alleles as we now know them) that determine traits are separated into gametes during meiosis, and meiosis produces equal numbers of egg or sperm cells that contain each allele.

Mendel developed the third principle of inheritance: the principle of independent assortment. According to this principle, alleles at one locus segregate into gametes independently of alleles at other loci. Such gametes are formed in equal frequencies.

Source: http://www.nature.com/scitable/topicpage/Gregor-Mendel-and-the-Principles-of-Inheritance-593

See also: http://www.nature.com/scitable/topicpage/mendelian-genetics-patterns-of-inheritance-and-single-966

Allele

When I was first introduced to genetics, I learned that an allele was an alternate form of a gene. For example, there's a gene for seed shape in peas; one form will result in round peas and another will result in wrinkled peas. In addition, some alleles are dominant over the other, which makes some alleles dominant and others recessive. The resulting phenotype will depend on the alleles, their properties, and how they interact with each other. Lastly, if the copies of genes are identical, i.e. having the same allele, the gene is homozygous. If there are different alleles, the gene is heterozygous.

However, on the molecular level, an allele refers to a specific base pair on DNA. In diploid organisms, such as Homo sapiens, we have paired homologous chromosomes in somatic cells. Therefore, we have two alleles at each given position on a chromosome. As alleles can refer to sequences that do not lie in genes, alleles are not implicitly associated to genes.

What makes an allele recessive or dominant? http://genetics.thetech.org/ask/ask227

Minor allele frequency

Minor allele frequency (MAF) refers to the frequency at which the least common allele occurs in a given population.

See also http://en.wikipedia.org/wiki/Allele_frequency

Genotype frequency

https://en.wikipedia.org/wiki/Genotype_frequency

Heterogeneity

  • Allelic heterogeneity within a disease gene refers to a "disease allele" that could be any one of multiple inactivating mutations in the same gene
  • Genetic heterogeneity are mutations where different genes can lead to the same phenotype (many genes -> one trait)
  • Phenotypic heterogeneity (or variable expressivity) refers to the spectrum of phenotypes observed, despite the same underlying mutation of a gene (one gene -> many traits)
    • Related: Pleiotrophy, which occurs when one gene influences two or more seemingly unrelated phenotypic traits
  • Genes involved in various cellular roles in multiple organs can explain why mutations in one gene cause different phenotypes

SNP

http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism

Some notes related to SNP profiling:

  • The call rate for a given SNP is defined as the proportion of individuals in the study for which the corresponding SNP information is not missing
  • A large degree of homogeneity at a given SNP across study participants generally results in inadequate power to infer a statistically significant relationship between the SNP and the trait under study
  • Excess heterozygosity across typed SNPs within an individual may be an indication of poor sample quality, while deficient heterozygosity can indicate inbreeding or other substructure in that person

https://www.nature.com/scitable/topicpage/using-snp-data-to-examine-human-phenotypic-706

Haplotype

A haplotype is a group of genes within an organism that was inherited together from a single parent. The word "haplotype" is derived from the word "haploid," which describes cells with only one set of chromosomes, and from the word "genotype," which refers to the genetic makeup of an organism. A haplotype can describe a pair of genes inherited together from one parent on one chromosome, or it can describe all of the genes on a chromosome that were inherited together from a single parent. This group of genes was inherited together because of genetic linkage, or the phenomenon by which genes that are close to each other on the same chromosome are often inherited together. In addition, the term "haplotype" can also refer to the inheritance of a cluster of single nucleotide polymorphisms (SNPs), which are variations at single positions in the DNA sequence among individuals.

By examining haplotypes, scientists can identify patterns of genetic variation that are associated with health and disease states. For instance, if a haplotype is associated with a certain disease, then scientists can examine stretches of DNA near the SNP cluster to try to identify the gene or genes responsible for causing the disease.

Source: http://www.nature.com/scitable/definition/haplotype-haplotypes-142

Linkage

Genetic linkage describes the way in which two genes that are located close to each other on a chromosome are often inherited together. In 1905, William Bateson, Edith Rebecca Saunders, and Reginald C. Punnett noted that the traits for flower colour and pollen shape in sweet pea plants appeared to be linked together. A few years later, in 1911, Thomas Hunt Morgan, who was studying heredity in fruit flies, noticed that the eye colour of a fly was associated with the fly's sex and hypothesised that the two traits were linked together. These observations led to the concept of genetic linkage, which describes how two genes that are closely associated on the same chromosome are frequently inherited together. In fact, the closer two genes are to one another on a chromosome, the greater their chances are of being inherited together or linked. In contrast, genes located farther away from each other on the same chromosome are more likely to be separated during recombination, the process that recombines DNA during meiosis. The strength of linkage between two genes, therefore, depends upon the distance between the genes on the chromosome.

Source: http://www.nature.com/scitable/definition/linkage-51

Recombination

Recombination is a process by which pieces of DNA are broken and recombined to produce new combinations of alleles. This recombination process creates genetic diversity at the level of genes that reflects differences in the DNA sequences of different organisms.

In eukaryotic cells, which are cells with a nucleus and organelles, recombination typically occurs during meiosis. Meiosis is a form of cell division that produces gametes, or egg and sperm cells. During the first phase of meiosis, the homologous pairs of maternal and paternal chromosomes align. During the alignment, the arms of the chromosomes can overlap and temporarily fuse, causing a crossover. Crossovers result in recombination and the exchange of genetic material between the maternal and paternal chromosomes. As a result, offspring can have different combinations of genes than their parents. Genes that are located farther apart on the same chromosome have a greater likelihood of undergoing recombination, which means they have a greater recombination frequency.

Source: http://www.nature.com/scitable/definition/recombination-226

Further reading

Germline vs. somatic detection

In cancer, to distinguish somatic mutations from germline variants requires sequencing the tumour alongside a patient matched normal; variants detected in the tumour tissue but not in the control are candidate somatic mutation candidates.

https://www.ncbi.nlm.nih.gov/books/NBK21894/

Genetics and statistical analyses

The first thing any scientist does before performing an experiment is to form a hypothesis about the experiment's outcome. This often takes the form of a null hypothesis, which is a statistical hypothesis that provides the expected values for an experiment. The null hypothesis is proposed by a scientist before completing an experiment, and it can be supported by data or disproved in favour of an alternate hypothesis.

Extrinsic hypothesis: Prediction of the number of observed individuals with specific characteristics based on calculations performed before the experiment is completed.

Intrinsic hypothesis: When expected proportions of individuals with the observed characteristics are calculated after the experiment is done using a specific piece of required data.

Goodness of fit: An expression in statistics referring to the measure of how closely aligned a function derived from actual cumulative data is to a predicted model function.

Pearson's chi-square test works well with genetic data as long as there are enough expected values in each group. In the case of small samples (less than 10 in any category) that have 1 degree of freedom, the test is not reliable. (Degrees of freedom, or df, will be explained in full later in this article.) However, in such cases, the test can be corrected by using the Yates correction for continuity, which reduces the absolute value of each difference between observed and expected frequencies by 0.5 before squaring. Additionally, it is important to remember that the chi-square test can only be applied to numbers of progeny, not to proportions or percentages.

Calculating Pearson's chi-square using Mendelian ratios using R, where the null hypothesis is that in a cross between two heterozygote (Tt) plants, the offspring should occur in a 3:1 ratio of tall plants to short plants.

ratio <- matrix(c(300,100,305,95), ncol=2, byrow=TRUE)
colnames(ratio) <- c('Tall','Short')
rownames(ratio) <- c('Expected','Observed')
ratio <- as.table(ratio)
ratio
         Tall Short
Expected  300   100
Observed  305    95
chisq.test(t(ratio), correct=F)

	Pearson's Chi-squared test

data:  t(ratio)
X-squared = 0.1695, df = 1, p-value = 0.6805

Source http://www.nature.com/scitable/topicpage/genetics-and-statistical-analysis-34592

GWAS

Main page is at GWAS

In genetic epidemiology, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS) or common-variant association study (CVAS), is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. GWAS typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major diseases.

Source http://en.wikipedia.org/wiki/Genome-wide_association_study

See also: http://www.nature.com/scitable/topicpage/genetic-variation-and-disease-gwas-682

Transposition

http://www.nature.com/scitable/topicpage/barbara-mcclintock-and-the-discovery-of-jumping-34083

Terminology

An autosomal recessive disorder means two copies of an abnormal gene must be present in order for the disease or trait to develop. (http://www.nlm.nih.gov/medlineplus/ency/article/002052.htm)

In an autosomal dominant disease, if you inherit the abnormal gene from only one parent, you can get the disease. Often, one of the parents may also have the disease. (http://www.nlm.nih.gov/medlineplus/ency/article/002049.htm)

Sex-linked diseases are passed down through families through one of the X or Y chromosomes. X and Y are sex chromosomes. (http://www.nlm.nih.gov/medlineplus/ency/article/002051.htm)

X-linked dominant inheritance, sometimes referred to as X-linked dominance, is a mode of genetic inheritance by which a dominant gene is carried on the X chromosome. (http://en.wikipedia.org/wiki/X-linked_dominant_inheritance)

X-linked recessive inheritance is a mode of inheritance in which a mutation in a gene on the X chromosome causes the phenotype to be expressed in males (who are necessarily homozygous for the gene mutation because they have only one X chromosome) and in females who are homozygous for the gene mutation (i.e., they have a copy of the gene mutation on each of their two X chromosomes). (http://en.wikipedia.org/wiki/X-linked_recessive_inheritance)

Diseases

http://www.nature.com/scitable/topicpage/Gene-Mapping-and-Disease-34600

http://www.nature.com/scitable/topicpage/Huntington-Disease-The-Discovery-of-Huntington-Gene-851

Huntingtin (HTT) was the first disease-associated gene to be molecularly mapped to a human chromosome (Gusella et al., 1983).

Single gene disorders (single gene disorders are caused by defects in one particular gene, and often have simple and predictable inheritance patterns) - http://genome.wellcome.ac.uk/doc_wtd020848.html

Dominant diseases (single gene disorders that occur when there is only one defective copy of the relevant gene) - http://genome.wellcome.ac.uk/doc_WTD020849.html

Recessive diseases (single gene disorders that require the presence of two disease-causing alleles) - http://genome.wellcome.ac.uk/doc_wtd020850.html

X-linked diseases (single gene disorders caused by defective genes on the X chromosome) - http://genome.wellcome.ac.uk/doc_wtd020851.html

Standard models for disease penetrance that imply a specific relationship between genotype and phenotype include multiplicative, additive, common recessive and common dominant models

Anticipation: increasing severity or earlier age of onset of a genetic trait in succeeding generations. For example, symptoms of a genetic disease may become more severe as the trait is passed from generation to generation.

Books

Exploring Personal Genomics

Notes from http://www.amazon.com/Exploring-Personal-Genomics-Joel-Dudley/dp/0199644497

Chapter 2

  • Genetics is not destiny: notions of “genetic determinism” should be avoided.
  • The scope of personal genome analysis is still limited by “missing heritability”, incomplete knowledge of disease biology, confounding environmental and population-specific effects, and technical limitations.

Chapter 3

  • Genetic information is typically stored in variant files, which represent differences between the personal genome and a common reference genome. Variants are typically indexed using standard identifiers or coordinate systems based on a reference genome.