A brief (and steadily increasing) definition of biological concepts surrounding my area of work.

Total hits for this page: 14056

General biology

Genetic Code

Defines how information is encoded in the genetic material. In Eukaryotes three nucleotides on the DNA define a codon that can be translated to it's amino acid.
The genetic code is degenerate as some different codons code for the same amino acid. Codon usage defines a preferred codon for a certain amino acid (among the few possible codons that code that aa).

Mendelian Genetics - Patterns of Inheritance and Single-Gene Disorders

From scitable http://www.nature.com/scitable/topicpage/mendelian-genetics-patterns-of-inheritance-and-single-966(external link)

Gene Models

To represent genes in a universal way we need models. But different databases use different gene models.

Gene model examples:


A role for pseudogenes: http://www.nature.com/nature/journal/v465/n7301/edsumm/e100624-02.html(external link)

"The number of pseudogenes present in the genomes of multicellular organisms is much higher than that present in the genomes of unicellular organisms. The number of human pseudogenes (~18,000) is close to that of the protein-coding genes (20,000 to 30,000). The noncoding genome, to which pseudogenes belong, may serve as a repository of at least a portion of the information underlying highly complex systems."

Pseudogene sequences are conserved, suggesting "pseudogene identity is preserved by selective pressure." According to the paper, "The existence of conserved processed pseudogenes that are transcribed irrespective of their position in the genome suggests that they are maintained to exert a specific role."

"Consistent with the notion that they exert biological functions, the expression of pseudogenes is a regulated process." Indeed, "various pseudogenes show a spatiotemporal expression pattern distinct from that of their coding counterparts," suggesting they are playing functional roles.

Source "Pseudogenes: Newly Discovered Players in Human Cancer," Science Signaling, 5 (242) (September 18, 2012)


Ribonucleic acid, second part of the Central Dogma. Built from a chain of nucleotides same as DNA but chemical differences from DNA include different sugar (ribose) and base (uracil instead of thymine). Mostly single stranded in eukaryotes (some viruses use RNA instead of DNA as genetic material carrier) but can adopt 3D structures.

List of RNAs: http://en.wikipedia.org/wiki/List_of_RNAs(external link)

Classical RNAs

  • rRNA - ribosomal RNA, builds ribosomes
  • mRNA - messenger RNA, transfers the blueprint for a protein from DNA to the ribosomes in the cytoplasm
  • tRNA - transfer RNA, transfers a specific amin oacid to the ribosome to add to the growing polypeptide chain

More recently discovered classes of RNA

Technological advances in molecular biology, sequencing and bioinformatics are fuelling the discovery of more and more new classes of RNA, especially non-coding RNAs (ncRNAs). Their discovery has eluded scientists for years due to the idea that only coding RNAs were important. Now we understand that ncRNAs are involved in many different regulatory processes.

ncRNAs have been shown to interact with DNA, RNA and protein molecules. They are involved with nuclear organisation and transcriptional, post-transcriptional and epigenetic processes. Below are some classes of ncRNAs that have been discovered.


MicroRNAs are the most studied class of noncoding RNAs and range in size from 20-24 nt. miRNA gene are transcribed and form primary miRNA (pri-miRNA). Pri-miRNA are processed by Drosha and form precursor miRNA (pre-miRNA). A protein called exportin 5 exports pre-miRNA from the nucleus into the cytoplasm, where it is cleaved by an endonuclease called Dicer. After unwinding, single stranded miRNAs are formed and recruits the RNA-induced silencing complex (RISC) to a target mRNA for gene silencing.

See also this short video on Dicer and Drosha: http://www.scivee.tv/node/9346(external link)


Small interfering RNAs are in the size range of miRNAs, 20-24 nt. Unlike mammalian miRNAs, they tend to have perfect complementarity to their mRNA target. Their biogenesis is via Dicer processing of long double stranded RNA.

See also: http://www.nobelprize.org/nobel_prizes/medicine/laureates/2006/advanced.html(external link)


Piwi-interacting RNAs are around 26-31 nt long. They are specifically expressed in the germ line, however their biogenesis is unclear. piRNAs form RNA-protein complexes through interactions with piwi proteins. They are known to silence the activity of retrotransposons.


Small nuclear ribonucleic acid is a class of small RNA molecules that are found within the nucleus of eukaryotic cells.

http://en.wikipedia.org/wiki/Small_nuclear_RNA(external link)


http://en.wikipedia.org/wiki/Small_nucleolar_RNA(external link)


DNA damage RNAs were first identified in the article "Site-specific DICER and DROSHA RNA products control the DNA-damage response" http://www.nature.com/nature/journal/vaop/ncurrent/full/nature11179.html.(external link) They have been demonstrated to have a crucial role in mediating the DNA damage response. In the absence of DDRNAs, DNA repair is greatly impaired. DDRNAs are produced near the site of damage and although their specific size is unknown, it has been demonstrated that the size of DDRNAs are enriched at 22-23 nt.

See also these awesome animations on DNA repair: http://web.mit.edu/engelward-lab/animations.htm(external link) and learn more about RNase H at http://en.wikipedia.org/wiki/RNase_H(external link)

Further reading

  • Kapranov P. et al.:"Genome-wide transcription and the implications for genomic organization", Nat Rev Genet, 2007 vol. 8 (6) pp. 413-23. PMID: 17486121
  • Willingham and Gingeras: "TUF love for "junk" DNA", Cell, 2006 vol. 125 (7) pp. 1215-20. PMID:16814704


A genome-wide association study (GWAS) is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. GWAS typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major diseases.

When studying the population genetics of a single species, the recombination rate determines how likely it is that proximal sequence variants share the same coalescent tree. Lack of recombination leads to linkage disequilibrium, in which nearby segregating variants are correlated. This phenomenon is exploited in correlating specific segregating variants with phenotypic traits or diseases—for example, in genome-wide association studies conducted with microarrays or incomplete sequencing data. However, this same phenomenon limits the resolution of these approaches in finding the actual causal variant. Genome-wide association studies are also blind to the patterns of allele segregation in close relatives. Future genotype-phenotype studies using complete genomes will increasingly use genotypic context in related as well as unrelated cases and controls, combined with better prediction of the possible effects of genome variants, to identify causal variants.





The genome is the collection of the complete set of genes, which have been described as a blueprint for building a functional organism. However, genes are transcribed into RNA and their regulation and expression has profound effects of how an organism is built. The transcriptome is the complete set of RNA transcripts, which may be an entirely different population depending on biological conditions. This makes the transcriptome extremely dynamic and more reflective of a biological condition unlike the genome which is fixed. The study of the transcriptome is called transcriptomics and is a global view of all gene expression patterns.


Promoter sequences contain specific DNA sequences that allow RNA polymerase and transcription factors to bind to. Once the RNA polymerase binds, transcription can take place thus this region of DNA facilitates the transcription of a particular gene. There are 3 main promoter elements; the core, proximal and distal promoters.

The core promoter is the minimal portion of the promoter that is required to properly initiate transcription. It is a binding site for RNA polymerase and general transcription factor binding sites and occurs approximately -34 bases.

The proximal promoter contains the proximal sequence upstream of the gene (~ -250) that tends to contain primary regulatory elements, which provides binding sites for specific transcription factor binding sites.

The distal promoter (or enhancer) contains the distal sequence upstream of the gene that may contain additional regulatory elements but has a weaker influence than the proximal promoter. It is usually further upstream than the proximal sequence and provides binding sites for specific transcription factor binding sites.

Core promoters can be broadly separated into two classes, sharp and broad promoters. Sharp promoters usually contain the TATA consensus sequence and have a well defined position for RNA polymerase binding. The broad promoters are usually rich in GC nucleotides and called CpG islands; transcription can begin at multiple possible sites thus forming a broad distribution in contrast to the sharp distribution.

Transcription factor

Transcription factors (TFs) are proteins that bind to specific DNA sequences (just like RNA polymerase) and controls the transcription of genomic DNA to RNA. A defining feature of TFs is that they contain one or more DNA-binding domains (DBDs), which attach to specific sequences of DNA adjacent to the genes that they regulate.

There are various mechanisms that TFs use to regulate transcription:

  • TFs can stabilize or block the binding of RNA polymerase to DNA thus preventing transcription
  • TFs can catalyse the acetylation or deacetylation of histone proteins. The acetylation of lysine residues of histone proteins weakens the association of DNA with histones, thus making DNA more accessible for transcription. Deacetylation does the reverse.
  • TFs may recruit coactivator or corepressor proteins to the transcription factor DNA complex, which can increase or decrease transcription respectively.

DNA-binding domain

TFs contain DNA-binding domains (DBDs) that allow them to bind to specific DNA sequence (a recognition sequence) or have a general affinity to DNA. These DBDs are an independently folded protein domain which contains at least one motif that recognizes double or single stranded DNA.

DBDs are often part of a large protein containing additional domains with different functions. These additional domains often regulate the activity of the DBD and the function of DNA binding is either structural or transcriptional regulation, with the two roles sometimes overlapping.

Some examples of DBDs include the zinc finger, leucine zipper and winged helix.


PCR and RT

From scitable http://www.nature.com/scitable/topicpage/the-biotechnology-revolution-pcr-and-the-use-553(external link)

Gene cloning using plasmids

Gene cloning animation(external link)

Tiling array

Tiling arrays are a subtype of microarray chips where the probes are designed to cover a region of the genome, hence the term "tiling" array. They function on a similar principle to traditional microarrays where labelled target molecules are hybridised to unlabelled probes on a fixed solid surface.

While traditional DNA microarrays are designed to look at gene expression of a few probes for a known or predicted gene, tiling arrays may characterise previously unidentified genes and provide an unbiased look at gene expression similar to CAGE.

The number of features on a single array can range from 10,000 to greater than 6,000,000, with each feature containing millions of copies of one probe. Thus depending on the probe lengths and spacing, different degrees of resolution can be achieved.

While tiling arrays have been used for individual gene expression analyses, they have been used for transcriptome mapping, ChIP-chip, MeDIP-chip and DNase Chip studies.


The first ChIP refers to chromatin immunoprecipitation and the second to microarray technology, thus ChIP-chip is a technique combining the use of the two technologies. Like regular ChIP, ChIP-chip is used for deducing interactions between proteins and DNA in vivo.

A whole genome ChIP-chip analysis can be performed to determine the locations of binding sites for proteins of interest; this allows the identification of the cistrome, which are the total number of binding sites for DNA binding proteins.

The goal of ChIP-chip is the localisation of protein binding sites in the genome, in the context of chromatin. Examples of proteins operating around chromatin including transcription factors, replication-related proteins, and histones. ChIP-chip may be used to determine transcription factor binding sites (TFBSs) in the genome; these TFBSs may be promoters, enhancers, repressors and silencing elements, insulators, boundary elements, and sequences that control DNA replication. Using ChIP-chip for studying histones may allow the identification of modification sites, which may offer insights into the mechanisms of transcriptional regulation.

One powerful use of ChIP-chip is the cataloguing of all protein DNA interactions under different conditions, which is one of the long term goals. This knowledge will further our understanding of the machinery behind gene regulation, cell proliferation and disease progression. It is also commonly used in epigenetics studies.

Mass spec

http://www.chemguide.co.uk/analysis/masspec/howitworks.html(external link)


Epigenetics refers to the heritable modifications or mechanisms that govern transcriptional regulation that are not directly related to the DNA sequence.

Within the chromosome, DNA is packaged into chromatin; chromatin consists of DNA, structural histone proteins and non-histone proteins. Within chromatin the repeating unit is the nucleosome; nucleosomes are made up of 146 bps of two super-helical turns of DNA wrapped around a core of eight histones. The histones are responsible for maintaining the chromatin's shape and structure. The accessibility of DNA, either caused by chemical modifications to the DNA or the histones, plays a role in regulating gene expression. The modification of histones is an example of an epigenetic modification.

Epigenetic modifications such as histone acetylation occur at the amino tail of the histone that protrudes from the nucleosome. Such epigenetic factors have a profound influence on gene expression by controlling the accessibility of DNA. Histone acetylation is controlled by the balance and activity of two enzymes, histone acetyltransferase (HAT) and histone deacetylase (HDAC). Transcription factors and enzymes that deacetylate (e.g. HDACs) lysine residues of histones results in tight coiling of the DNA and a closed chromatin structure and gene silencing. Inappropriate silencing abolishes gene expression and causes a number of human diseases. In some cancer cells, there is either an overexpression of HDACs, an aberrant recruitment of HDACs or underexpression of HATs, resulting in hypoacetylation of histones and therefore a condensed or closed chromatin structure.

Another form of epigenetic modification that causes gene silencing is DNA methylation, which chemically modifies the DNA molecule itself. DNA methylation is carried out by an enzyme called DNA methyltransferase. Methylation directly switches off gene expression by preventing transcription factors binding to promoters. Methylation also attracts a number of methyl-binding domain (MBD) proteins.

On the other hand, for a gene to be transcribed it must become physically accessible to transcriptional machinery; histone acetylation by HAT causes uncoiling of DNA and results in an open chromatin structure. Genes are thus accessible to transcription factors and gene expression occurs, which consequently produces proteins.

http://www.youtube.com/watch?v=eYrQ0EhVCYA(external link)

Chromatin remodelling

Chromatin remodelling is the addition or removal of chemical groups to or from histone proteins, which ultimately alter gene expression patterns. The process of modifying chemical groups can lead to either open or closed chromatin and is of vital importance to the proper functioning of all eukaryotic cells.

In recent years, researchers have discovered a great deal about chromatin remodeling, including the roles that different protein complexes, histone variants, and biochemical modifications play in this process.

Various molecules called chromatin remodelers provide the mechanism for modifying chromatin and allowing transcription signals to reach their destinations on the DNA strand. Currently, investigators know that chromatin remodelers are large, multiprotein complexes that use the energy of ATP hydrolysis to mobilize and restructure nucleosomes. Recall that nucleosomes wrap 146 base pairs of DNA in approximately 1.7 turns around a histone-octamer disk, and the DNA inside each nucleosome is generally inaccessible to DNA-binding factors. Remodelers are thus necessary to provide access to the underlying DNA to enable transcription, chromatin assembly, DNA repair, and other processes. Just how remodelers convert the energy of ATP hydrolysis into mechanical force to mobilize the nucleosome, and how different remodeler complexes select which nucleosomes to move and restructure, remains unknown, however.

Remodelers are partitioned into five families, each with specialized biological roles. Nonetheless, all remodelers contain a subunit with a conserved ATPase domain. In addition to the conserved ATPase, each remodeler complex also possesses unique proteins that specialize it for its unique biological role. However, because all remodelers move nucleosomes and all such movement is ATP dependent, mobilization is most likely a property of the conserved ATPase subunit.

The ATPase domains of remodelers are similar in sequence and structure to known DNA-translocating proteins in viruses and bacteria. Recent evidence from the SWI/SNF and ISWI remodeler families has also revealed that remodeler ATPases are directional DNA translocases that are capable of the directional pumping of DNA. But how is this property applied to nucleosomes? It seems that the ATPase binds approximately 40 base pairs inside the nucleosome, from which location it pumps DNA around the histone-octamer surface. This enables the movement of the nucleosome along the DNA, thus permitting the exposure of the DNA to regulatory factors.

The additional domains and proteins that are attached to the ATPase are important for nucleosome selection, and they also help regulate ATPase activity. These attendant proteins bind to histones and nucleosomal DNA, and their binding to these molecules is affected by the histone modification state. The modification state helps determine whether the nucleosome is an appropriate substrate for a remodeler complex.

The composition of nucleosomes is not set in stone, however. Indeed, canonical histones can themselves be replaced by histone variants or modified by specific enzymes, thereby making the surrounding DNA more or less accessible to the transcriptional machinery.

So far, a number of histone variants have been found and localized to specific areas of chromatin. For instance, H2A.Z is a variant of H2A and is often enriched near relatively inactive gene promoters. Interestingly, H2A.Z does not take its place during replication when the chromatin structure is established. Instead, the chromatin remodeling complex SWR1 catalyzes an ATP-dependent exchange of H2A in the nucleosome for H2A.Z.

CENP-A is another known histone variant that has been found to be associated with centromeres. Originally localised to the centromere through immunofluorescence studies, CENP-A was believed to be involved in centromeric activity during cell division. But, once the CENP-A protein was isolated and sequenced, it was shown to have sequence homology to H3, suggesting that CENP-A actually replaces canonical H3 near the centromere. Some experiments suggest that these variant histones that occur in particular areas of the genome may assist in the specific regulation of chromatin behavior and gene transcription from these areas.

Specifically, histone modification involves covalent bonding of various functional groups to the free nitrogens in the R-groups of lysines in the N-terminal tail. Early research has linked differing levels of acetylation and methylation on the histones to altered rates of DNA transcription. While the most common additions are acetylation and methylation of lysine residues, many more types of modifications have also been observed, including phosphorylation, a common posttranslational modification. The different types of modifications, which have been called the "histone code," are put in place by a variety of different enzymes, many of which have yet to be fully characterized. Thus, the story of the remodeling machinery continues to be told through a variety of experiments, and much remains to be revealed.

Because eukaryotic DNA is tightly wrapped around nucleosomes and the positive charges of the histones tightly bind the negative charges of the DNA, nucleosomes essentially act as a physical barrier to transcription factors that need to bind to certain regions of DNA. However, specific acetylations can remove the positive charge on the lysine amino group that is acetylated, so the nucleosome "loses its grip" on the DNA. This modification results in a loosening of the coil.

Other remodeling enzyme complexes actually slide the nucleosomes along the DNA to clear them from the promoter regions. In this case, the remodeling enzymes use the energy from ATP to regulate nucleosome movement. For example, prior to transcription in yeast, one of the major types of chromatin remodeling machines, called the SWI/SNF and SAGA histone acetylase complex, is recruited to the yeast HO gene promoter by the SWI5 activator. Activator-dependent chromatin modification then moves the nucleosome out of the way so that RNA polymerase II can reach the promoter regions of the DNA.

Chromatin remodeling activity by SWI/SNF or other remodeling machines can also be required for recruiting additional chromatin remodeling activity to the site, as well as additional downstream sites. Modifications at a promoter can occur in multiple steps that are independently regulated, and additional modifications can occur stepwise stretching from the point of the first modification along the DNA strand in a downstream direction toward the promoter. These modifications open up an elongated region of active chromatin and allow for a wide range of intermediate, transcriptionally inactive states for the eukaryotic promoter. Promoters can also be poised with RNA polymerase bound but not elongating the mRNA; in yeast, up to 15% of sites have such stalled transcription. Changes in gene expression during the specific developmental stages of an organism or cell coincide with fluctuations in the levels of each of the specific protein complexes involved in chromatin remodeling.

Varying levels and types of histone modifications have been shown to correlate with levels of chromatin activation. For example, one group of researchers used antibody-based immunoprecipitation studies to determine that acetylation of histone H3 and methylation at lysine residue K4 appeared to coincide with each other. They also coincided during transcriptional activation in chicken embryos, while methylation at lysine residue K9 marked inactive chromatin.

Another means by which transcription is controlled is through methylation of the DNA strand itself. Not to be confused with histone methylation, methylation of the DNA strand involves cytosine bases of eukaryotic DNA being converted to 5-methylcytosine, resulting in the repression of transcription, particularly in vertebrates and plants. The altered cytosine residues are usually immediately adjacent to a guanine nucleotide, resulting in two methylated cytosine residues set diagonally to one another on opposing DNA strands.

Heavily methylated regions of DNA with elevated concentrations of these so-called CpG groups are often found near transcription start sites. In an interestingly coordinated process, proteins that bind to methylated DNA also form complexes with proteins involved in deacetylation of histones. Therefore, when the DNA is in a methylated state, nearby histones are deacetylated, resulting in compact, semipermanently silent chromatin. Likewise, demethylated DNA does not draw deacetylating enzymes to the histones, but it often attracts histone acetyltransferases, allowing histones to remain acetylated and promoting transcription.

Storage of eukaryotic DNA in small, compact nuclei requires that this DNA be tightly coiled and compacted in the form of chromatin. However, the structure of chromatin also appears to serve a second, possibly more important role, in that it gives eukaryotic cells the capability to exert complex levels of control over gene expression.

Chromatin and the DNA sequences it contains are constantly undergoing modifications, thereby periodically exposing different regions of DNA to transcription factors and RNA polymerases. The cumulative effects of these changes are various states of transcriptional control and the ability of eukaryotic cells to turn genes on and off as needed. This complexity provides eukaryotes with a means of making the most of a relatively small number of genes. However, much research remains to be performed before investigators precisely understand how the many mechanisms of chromatin remodeling operate, as well as how they work together to result in the complex patterns of gene expression characteristic of eukaryotic cells.

From scitable http://www.nature.com/scitable/topicpage/chromatin-remodeling-in-eukaryotes-1082(external link)

DNA methylation

From scitable http://www.nature.com/scitable/topicpage/The-Role-of-Methylation-in-Gene-Expression-1070(external link)

Polycomb-group proteins

Polycomb-group proteins are a family of proteins first discovered in fruit flies that can remodel chromatin such that epigenetic silencing of genes takes place. Polycomb-group proteins are best known for silencing Hox genes through modulation of chromatin structure during embryonic development in fruit flies (Drosophila melanogaster).

In Drosophila, the Trithorax-group (trxG) and Polycomb-group (PcG) proteins act antagonistically and interact with chromosomal elements, termed Cellular Memory Modules (CMMs). Trithorax-group (trxG) proteins maintain the active state of gene expression while the Polycomb-group (PcG) proteins counteract this activation with a repressive function that is stable over many cell generations and can only be overcome by germline differentiation processes. Polycomb Gene complexes or PcG silencing involves at least three kinds of multiprotein complex PRC1, PRC2 and PhoRC which work together to carry out their repressive effect.

In humans Polycomb Group gene expression is important in many aspects of development. Murine null mutants in PRC2 genes are embryonic lethals while most PRC1 mutants are live born homeotic mutants that die perinatally. In contrast overexpression of PcG proteins correlates with the severity and invasiveness of several cancer types. The mammalian PRC1 core complexes are very similar to Drosophila. Polycomb is known to regulate ink4 locus (P16, P19ARF)