Single cell

From Dave's wiki
Jump to navigation Jump to search

I believe very strongly that the fundamental unit, the correct level of abstraction, is the cell and not the genome. --Sydney Brenner (lecture at Columbia University 2003)

Single cell RNA-seq (scRNA-seq) has led to the identification of novel cell types and of stochastic gene expression patterns. The quote above is from a talk by Sydney Brenner given at Columbia University in 2003; I have hosted the transcript of the talk at

In standard genetic sequencing, DNA or RNA is extracted from a blend of many cells to produce an average read-out for the entire population. Regev compares this approach to a fruit smoothie. The colour and taste hint at what is in it, but a single blueberry, or even a dozen, can be easily masked by a carton of strawberries. By contrast, “single-cell-resolved data is like a fruit salad”, Regev says. “You can distinguish your blueberries from your blackberries from your raspberries from your pineapples and so on.” That promised to expose a range of overlooked cellular variation.[1]

Cells also have different numbers of total mRNA and it was commonly believed that the absolute amount of total mRNA in each cell is similar across different cell types or experimental perturbations. If we assay similar amounts of RNA from different cells, we do not observe this difference in absolute amount of total mRNA[2]; thus it is more appropriate to sequence ALL the mRNA in individual cells and compare these profiles.

Why single cell?

Novel applications and analytical tools are now putting emphasis on inferring the functional roles of cells in tissues and developmental events, as well as the genetic programs that drive them.[3]

Allows to study new biological questions in which cell-specific changes in transcriptome are important, e.g. cell type identification, heterogeneity of cell responses, stochasticity of gene expression, inference of gene regulatory networks across the cells.

  • Single-cell genomics will help uncover cell lineage relationships
  • Single-cell transcriptomics will help discover new cell types
  • Single-cell epigenomics and proteomics will allow the functional states of individual cells to be analysed

See "Single-cell sequencing-based technologies will revolutionize whole-organism science"

"Quantify intra-population heterogeneity and enable study of cell states and transitions, potentially revealing cell subtypes or gene expression dynamics that are masked in bulk, population-averaged measurements."

As cells move between states various genes are silenced and activates. These transient states are often hard to characterise because purifying cells in between more stable endpoint states can be difficult or impossible. However, single cell RNA-seq can enable you to capture these states without the need for purification.

It has been shown that the average expression level of a population of cells can be strongly biased by a few cells with high expression and is thus not reflective of a typical individual cell from that population

Measurements using FISH indicate that levels of specific transcripts can vary as much as 1,000-fold between presumably equivalent cells, further illustrating the value of profiling whole transcriptomes at the single-cell level.


Common applications of single-cell RNA sequencing:

  • De-convoluting heterogeneous cell populations
  • Trajectory analysis of cell state transitions
  • Dissecting transcription mechanics
  • Network inference, e.g. building gene regulatory networks
  • Map cell developmental trajectories

Studies indicate how strongly cells can show their individuality. Brain cells may express as few as 65% of the same genes as their neighbours, according to an unpublished analysis by Eberwine. In the immune system, cells placed in the same category on the basis of surface markers can express different sets of genes, and have different responses to vaccines. And as tumour cells evolve, their genomes quickly become twisted in unique ways.[4]

In experiments described in 2015, Walsh’s team sequenced the complete genomes of 36 cortical neurons from 3 healthy people who had died and donated their brains to research. Reconstructing the relationship between the brain cells in an individual revealed that closely related cells can be spread across the cortex, whereas local areas can contain multiple distinct lineages. Successive generations of cells seem to venture far from their ancestral homes. One cortical neuron, for instance, was more closely related to a heart cell from the same person than to three-quarters of the surrounding neurons. “We were not expecting to find that,” Walsh says.[5]


The methods can be categorised in different ways, but the two most important aspects are quantification and capture. The strategy used for capture determines throughput, how the cells can be selected as well as what kind of additional information besides the sequencing that can be obtained. The three most widely used options are microwell-, microfluidic- and droplet- based.

  1. For well-based platforms, cells are isolated using for example pipette or laser capture and placed in microfluidic wells.
  2. Microfluidic platforms, such as Fluidigm’s C1, provide a more integrated system for capturing cells and for carrying out the reactions necessary for the library preparations.
  3. The idea behind droplet based methods is to encapsulate each individual cell inside a nanoliter droplet together with a bead. The bead is loaded with the enzymes required to construct the library.

Two recent studies from the Enard group[6] and the Teichmann group[7] have compared several different protocols.

General experimental workflow

  1. Dissociate tissues to form a single-cell suspension
  2. Single cells are isolated (using cell surface markers or microdissection)
  3. Cells are lysed
  4. RNA is captured for reverse transcription into cDNA
  5. cDNA is pre-amplified
  6. Library preparation libraries
  7. Sequencing
  8. Downstream analysis

Advancements in microwell and droplet-based cell-barcoding strategies have enabled the analysis of tens of thousands of cells in a single experiment

General issues

It is estimated that approximately 10% of a cell's transcript complement is represented in the final sequenced library (Islam et al. 2014); therefore it is unable to reliably detect low-abundance transcripts

  1. Issues with amplification bias in single cell RNA-seq; it is possible to introduce spike-ins however this uses up sequencing real estate; use UMIs instead
  2. The capture efficiency of RNA is low, i.e. single-molecule capture efficiency
  3. The efficiency and uniformity with the RNA -> cDNA conversion

These issues can lead to zero-inflated expression data, i.e. genes have zero expression

General analysis workflow

  1. The first step in any single-cell RNA-seq pipeline is removing poor-quality data, such as libraries made from dead cells or empty wells; it is also crucial to remove doublets, which are libraries that were made from two or more cell accidentally
  2. Examine distribution of mRNA totals across the cells and remove cells with either very low mRNA recovery or far more mRNA that the typical cell (doublets or triplets)
  3. Classifying and finding novel cell types

Chromium Single Cell 3' Solution

Microfluidic partitioning to capture single cells and prepare barcoded libraries. Gel Beads in Emulsion (GEMs) contain a single Gel Bead with barcoded oligonucleotides and RT reagents that provide a separate reaction centre for a single cell.


Genetic screens

Analysis tools

  • Seurat is an R package designed for QC, analysis, and exploration of single cell RNA-seq data. It easily enables widely-used analytical techniques, including the identification of highly variable genes, dimensionality reduction (PCA, ICA, t-SNE), standard unsupervised clustering algorithms (density clustering, hierarchical clustering, k-means), and the discovery of differentially expressed genes and markers.
    • Rahul Satija of the New York Genome Center in New York City, who developed one such tool, Seurat, as a postdoc with Regev, says that the software uses these data to position cells as points in 3D space. “That’s why we named the package Seurat,” he explains, “because the dots reminded us of points on a pointillist painting.”[8]




SingleCellExperiment (SCE) is a S4 class for storing data from single-cell experiments. This includes specialised methods to store and retrieve spike-in information, dimensionality reduction coordinates and size factors for each cell, along with the usual metadata for genes and libraries. To create an SCE object use the constructor SingleCellExperiment().


See ExperimentHub;


  • A single-cell RNA-seq study of differentiating human skeletal muscle myoblasts
  • Skeletal myoblasts undergo a well-characterised sequence of morphological and transcriptional changes during differentiation
  • Myocytes are long, tubular cells that develop from myoblasts to form muscles in a process known as myogenesis
  • In this experiment, primary human skeletal muscle myoblasts (HSMM) were expanded under high mitogen conditions (GM) and then differentiated by switching to low-mitogen media (DM)
  • RNA-seq libraries were sequenced from each of several hundred cells taken over a time-course (0, 24, 48, 72 hours) of serum-induced differentiation captured using the Fluidigm C1 microfluidic system
  • RNA from each cell was isolated and used to construct mRNA-Seq libraries, which were then sequenced to a depth of approximately 4 million reads per library, resulting in a complete gene expression profile for each cell
  • Libraries that contained fewer than 1 million reads or for which less than 80% of fragments mapped to non-mitochondrial protein coding genes were excluded
data(package = 'HSMMSingleCell')

# easier to type my_mat
my_mat <- HSMM_expr_matrix



Single cell and cancer

Single-cell sequencing (SCS) is a powerful new tool for investigating evolution and diversity in cancer and understanding the role of rare cells in tumor progression.[9] Cancer is where new lineage-tracing methods are likely to make waves first. “Cancer is a disease of lineage — it’s a disease of stem cells,” says Walsh. One question that researchers are starting to tackle is the origin of metastatic cells, which emerge from the primary tumour and invade sometimes distant organs. They tend to be the hardest tumour cells to vanquish and the ones most likely to kill patients.[10] Using single-cell genomics to sequence a tumour, biologists could determine which genes were being expressed by malignant cells, which by non-malignant cells and which by blood vessels or immune cells — potentially pointing to better ways to attack the cancer.[1]

  • Understanding lineage relationships provides a fundamental understanding of normal development and can provide insight into pathologies of development and cancer

Single cell in plants

Development is a process regulated by differential gene expression whereby cells acquire specific fates. A global map of gene expression within an organ can identify genes with coordinated expression in localised domains, thereby relating gene activity to cell fate and tissue specialisation.[11]

Single-cell transcriptomics has the potential to provide a new perspective on plant problems, such as the nature of the stem cells or initials, the plasticity of plant cells, and the extent of localised cellular responses to environmental inputs.[12]

The Human Cell Atlas

Papers / Reviews

Preprints / etc.



  1. 1.0 1.1 How to build a human cell atlas
  2. Revisiting Global Gene Expression Analysis
  3. How single cells do it
  4. Single-cell analysis: The deepest differences
  5. The trickiest family tree in biology
  6. Comparative Analysis of Single-Cell RNA Sequencing Methods
  7. Power analysis of single-cell RNA-sequencing experiments
  8. 8.0 8.1 Single-cell sequencing made simple
  9. The first five years of single-cell cancer genomics and beyond
  10. The trickiest family tree in biology
  11. A gene expression map of the Arabidopsis root
  12. The potential of single-cell profiling in plants