Workflow Overview


wayfinder

Objectives

  • List some applications of scRNA-Seq.
  • Compare and contrast the capabilities and challenges in scRNA-Seq vs bulk RNA-Seq approaches.
  • Introduce a specific experimental model to guide discussion and learning.
  • Outline the abstract approach to single-cell sequencing and consider the 10x Genomics platform in more detail.
  • Consider common challenges in executing these initial steps.



Overview of bulk RNA-seq and scRNA-Seq

Next Generation Sequencing (NGS) enables many powerful experimental designs and analysis approaches including variant identification, chromatin accessibility, gene expression, and more. Before we dive into the complexities of single-cell RNA-Seq analysis, it’s beneficial to review the new perspectives afforded by a scRNA-Seq single cell approach and helpful to consider scRNA-Seq alongside a traditional bulk RNA-Seq approach.

Bulk RNA-Seq approach
Bulk RNA-seq typically involves comparing the expression levels of genes between sets of tissues, e.g. untreated and treated mice. This enables researchers to characterize distinct expression patterns for a specific gene and also expression changes across functionally related genes or pathways. This is valuable because it provides an overall snapshot of the average expression program across the sample. However, considering the sample a single homogeneous population of cells can obscure subtle changes or patterns in expression.


Single-cell RNA-Seq approach

Instead of looking at the whole forest, that is to say, the average of the gene expression in a tissue or a biofluid, single cell illuminates expression for a collection of individual cells (i.e. you can now see consider individual trees). This enables similar perspectives to bulk (e.g. insight into biological mechanisms and distinct response to interventions) but also new views informed by cellular heterogeneity:

  1. What kinds of cells are present in this sample?
  2. How does the cell population structure change between groups/conditions?
  3. What are the expression patterns between cell types or between groups/conditions?
  4. How do cells change over time and how might we affect that development?



Consider a specific scRNA-Seq experiment

These concepts can be abstract. Acknowledging that the basic concepts are broadly applicable, it’s helpful to ground the conversation in a specific scRNA-Seq experiment.

Consider a specific experiment

In this workshop we will be focusing on an experiment conducted at UM on a mouse model where a soft tissue injury is followed by an aberrant injury response that generates bone tissue. For more details on this experiment, it’s original analysis and the biology of heterotopic ossification, see the full paper [1].

  1. A simplified version of the experiment:
  1. A sample of mouse tissue is extracted from a healthy mouse.
  2. The researcher induces a burn at the sample site.
  3. The researchers re-sample tissue from the site at several time points.
  4. Each sample undergoes scRNA-Seq prep and analysis. (This is replicated across four mice.)
  1. The scRNA-Seq analysis of this sample can reveal the population of cell types present and also the gene expression patterns of each cell time over time.



How scRNA-Seq works

It’s also useful to orient on how scRNA-Seq works at an abstract level. There are many different platforms and protocols, but many have steps similar to below.

Single cell protocol (from 30k feet)
  1. Sample tissue (or biofluid) is collected.
  2. Tissue is dissociated into a suspension of healthy, intact cells
  3. Cells are physically isolated.
  4. Cell transcripts converted to cDNA labeled with their cell of origin.
  5. cDNAs from all cells is pooled
  6. cDNAs undergo library prep and are sequenced
  7. The resulting transcript sequences can be partitioned into (putative) cells computationally.



10x Genomics 3’ gene expression

It’s useful to elaborate how transcripts are labeled with their cell of origin because this will help us understand how downstream QC and analysis works. The specifics of the steps depend on the platform and the specific library prep protocol. We will focus on the 10x Genomics 3’ gene expression approach[2].

A 10x Genomics single cell protocol (from 10k feet)
  1. 10x Genomics uses microfluidics to combine an isolated cell with a manufactured oligo-bead in an aqueous droplet in an oil emulsion. The oil isolates each droplet, effectively creating a reaction vessel for each cell-bead dyad.
  2. Ideally, each droplet contains reaction enzymes (carried in the aqueous solution), a single bead and a single healthy, intact cell. The cell is lysed to release the mRNA transcripts into the droplet.
  3. The beads are covered with a lawn of millions of oligos. Each oligo is designed to interact with poly-A tailed mRNAs and the enzymes to produce a complementary DNA molecule (cDNA).
  4. Each cDNA contains:
  • the sequence of an individual mRNA transcript (from the cell)
  • flanking sequence added for downstream library prep
  • a 12bp Unique Molecular Identifier (UMI): the UMIs are unique for each of the oligos on the bead ensuring each UMI represents a single mRNA. (This enables reliable de-duplication following sequencing.)
  • 16 bp barcode sequence: all barcodes are identical for a given bead, so the barcode sequence acts as a molecular label for each transcript; each barcode represent a distinct cell of origin for that mRNA.



Consider (just) two droplets
  • Each droplet converts mRNAs into cDNAs that contain the oligo sequence and the mRNA sequence.
  • Each mRNA sequence will get a distinct UMI, so one UMI = one mRNA.
  • For a single droplet, the cell barcodes will all match. Cell barcodes will be distinct across droplets.



From droplets to matrix
  1. Once mRNA transcripts have been converted to barcode-labeled cDNA the oil emulsion can be broken and the cDNA molecules are pooled together.
  2. cDNA molecules undergo several conventional library prep steps to enable sequencing.
  3. The sequencer calls bases for each sequence. For a large sequencing run, this might contain many samples, several experiments, and even multiple experiment types.
  4. The resulting 10x FASTQ files have a specific structure.
  • Read2 represents the mRNA sequence.
  • Read1 represents the barcode and UMI
  1. Specialized software bins each distinct barcode into a putative cell and aligns the mRNA sequence against a genome build.
  2. Alignments for features (genes) are quantified across all barcodes to create a feature barcode matrix.



Common problems and challenges

Some droplets don’t work as intended
  1. Interpreting the sequencing outputs is simplest when each droplet contains a single bead and a single healthy cell. The system is optimized for this outcome and typically the majority of droplets will follow this pattern. However, in each run there are always a few complicating edge cases to consider.
  2. Sometimes a droplet contains a bead but no cell. This is actually common but it’s impact is slight because in the absence of mRNA, the enzymes won’t produce cDNA. In effect, the droplet appears empty and is discarded.
  3. Sometimes two beads land in the same droplet with a cell. In theory, the cells mRNA would appear to come from two droplets (i.e. two cells) each with half the expected expression levels. In practice this rarely happens because the microfluidics are tuned to avoid this.
  4. Sometimes two cells join with a single bead in a droplet creating a doublet. This happens when some of the cells are not fully dissociated from each other. In this case, the mRNAs from two cells will receive the same label; the expression programs are merged and the overall expected expression is roughly doubled.
  5. Sometimes the stress of the protocol induces cells to start apoptosis. This confounds analysis because the expression programs aren’t about the model biology but instead artifacts from the experimental platform.
  6. Sometimes dissociated cells become so stressed they start to break down in the suspension. When that happens, the mRNAs from the popped cell’s combine together in the aqueous flow to create a soup of ambient RNA. This appears as a droplet with extremely low expression.



Contrasting bulk RNA-seq with scRNA-Seq

Bulk and single-cell approaches are fundamentally complementary approaches where bulk RNA-Seq provides a “forest-level” view while scRNA-Seq shows the individual trees. Compared to bulk RNA-Seq, scRNA-Seq provides powerful new perspectives. But it isn’t without challenges or downsides.

Bulk vs. Single-Cell
  1. Single-cell is less mature than bulk.
  2. Single-cell sample prep is more complex than bulk.
  3. Single-cell typically sees only the subset of highly expressed genes.
  4. Single cell analysis is typically more complex.
  5. Single cell analysis costs more than bulk analysis.



Summary

  • scRNA-seq offers a powerful and nuanced approach to studying gene expression at the cellular level. This technique can illuminate biological mechanisms of healthy tissue or disease as well as extend our understanding of cellular heterogeneity, responses to interventions, and the cell state dynamics.
  • scRNA-Seq experiments are typically more complex and often more expensive than bulk RNA-Seq.
scRNA-Seq steps in summary
  1. A sample of tissue is extracted
  2. Tissue is dissociated into a clean suspension of healthy cells
  3. A reaction creates a cDNA molecule which combines a barcode label with mRNA sequence.
  4. cDNAs are pooled together, library prep’ed and sequenced
  5. Computationally connect all the mRNAs back to a distinct cell of origin
  6. Align the mRNA sequences to create a count matrix across all the features and all the cells
  7. Bioinformatically separate the healthy cells from the experimental artifacts

The last three steps are complex and also computationally demanding. In 10x Genomics experiments, they are typically expedited by the tool Cell Ranger which we cover in detail in the next lesson.



References

  1. Sorkin, Michael et al. “Regulation of heterotopic ossification by monocytes in a mouse model of aberrant wound healing.” Nature communications vol. 11,1 722. 5 Feb. 2020.
    https://pubmed.ncbi.nlm.nih.gov/32024825
  2. 10x Genomics 3’ gene expression


    Back to introduction Top of this lesson Next lesson
