Workflow Overview
Objectives
- List some applications of scRNA-Seq.
- Compare and contrast the capabilities and challenges in scRNA-Seq vs
bulk RNA-Seq approaches.
- Introduce a specific experimental model to guide discussion and
learning.
- Outline the abstract approach to single-cell sequencing and consider
the 10x Genomics platform in more detail.
- Consider common challenges in executing these initial steps.
Overview of bulk RNA-seq and scRNA-Seq
Next Generation Sequencing (NGS) enables many powerful experimental
designs and analysis approaches including variant identification,
chromatin accessibility, gene expression, and more. Before we dive into
the complexities of single-cell RNA-Seq analysis, it’s beneficial to
review the new perspectives afforded by a scRNA-Seq single cell approach
and helpful to consider scRNA-Seq alongside a traditional bulk RNA-Seq
approach.
Bulk RNA-Seq approach
|
|
Bulk RNA-seq typically involves comparing the expression levels of genes
between sets of tissues, e.g. untreated and treated mice. This enables
researchers to characterize distinct expression patterns for a specific
gene and also expression changes across functionally related genes or
pathways. This is valuable because it provides an overall snapshot of
the average expression program across the sample. However, considering
the sample a single homogeneous population of cells can obscure subtle
changes or patterns in expression.
|
Single-cell RNA-Seq approach
|
|
Instead of looking at the whole forest, that is to say, the average
of the gene expression in a tissue or a biofluid, single cell
illuminates expression for a collection of individual cells (i.e. you
can now see consider individual trees). This enables similar
perspectives to bulk (e.g. insight into biological mechanisms and
distinct response to interventions) but also new views informed by
cellular heterogeneity:
- What kinds of cells are present in this sample?
- How does the cell population structure change between
groups/conditions?
- What are the expression patterns between cell types or between
groups/conditions?
- How do cells change over time and how might we affect that
development?
|
Consider a specific scRNA-Seq experiment
These concepts can be abstract. Acknowledging that the basic concepts
are broadly applicable, it’s helpful to ground the conversation in a
specific scRNA-Seq experiment.
Consider a specific experiment
|
|
In this workshop we will be focusing on an experiment conducted at UM
on a mouse model where a soft tissue injury is followed by an aberrant
injury response that generates bone tissue. For more details on this
experiment, it’s original analysis and the biology of heterotopic
ossification, see the full paper [1].
- A simplified version of the experiment:
- A sample of mouse tissue is extracted from a healthy mouse.
- The researcher induces a burn at the sample site.
- The researchers re-sample tissue from the site at several time
points.
- Each sample undergoes scRNA-Seq prep and analysis. (This is
replicated across four mice.)
- The scRNA-Seq analysis of this sample can reveal the population of
cell types present and also the gene expression patterns of each cell
time over time.
|
How scRNA-Seq works
It’s also useful to orient on how scRNA-Seq works at an abstract
level. There are many different platforms and protocols, but many have
steps similar to below.
Single cell protocol (from 30k feet)
|
|
- Sample tissue (or biofluid) is collected.
- Tissue is dissociated into a suspension of healthy,
intact cells
- Cells are physically isolated.
- Cell transcripts converted to cDNA labeled with their cell of
origin.
- cDNAs from all cells is pooled
- cDNAs undergo library prep and are sequenced
- The resulting transcript sequences can be partitioned into
(putative) cells computationally.
|
10x Genomics 3’ gene expression
It’s useful to elaborate how transcripts are labeled with their cell
of origin because this will help us understand how downstream QC and
analysis works. The specifics of the steps depend on the platform and
the specific library prep protocol. We will focus on the 10x
Genomics 3’ gene expression approach[2].
A 10x Genomics single cell protocol (from 10k feet)
|
|
- 10x Genomics uses microfluidics to combine an isolated cell with a
manufactured oligo-bead in an aqueous droplet in an oil emulsion. The
oil isolates each droplet, effectively creating a reaction vessel for
each cell-bead dyad.
- Ideally, each droplet contains reaction enzymes (carried in the
aqueous solution), a single bead and a single healthy, intact cell. The
cell is lysed to release the mRNA transcripts into the droplet.
- The beads are covered with a lawn of millions of oligos. Each oligo
is designed to interact with poly-A tailed mRNAs and the enzymes to
produce a complementary DNA molecule (cDNA).
- Each cDNA contains:
- the sequence of an individual mRNA transcript (from the cell)
- flanking sequence added for downstream library prep
- a 12bp Unique Molecular Identifier (UMI): the UMIs are unique for
each of the oligos on the bead ensuring each UMI represents a
single mRNA. (This enables reliable de-duplication following
sequencing.)
- 16 bp barcode sequence: all barcodes are identical for a given bead,
so the barcode sequence acts as a molecular label for each transcript;
each barcode represent a distinct cell of origin for that
mRNA.
|
Consider (just) two droplets
|
|
- Each droplet converts mRNAs into cDNAs that contain the oligo
sequence and the mRNA sequence.
- Each mRNA sequence will get a distinct UMI, so one UMI = one
mRNA.
- For a single droplet, the cell barcodes will all match. Cell
barcodes will be distinct across droplets.
|
From droplets to matrix
|
|
- Once mRNA transcripts have been converted to barcode-labeled cDNA
the oil emulsion can be broken and the cDNA molecules are pooled
together.
- cDNA molecules undergo several conventional library prep steps to
enable sequencing.
- The sequencer calls bases for each sequence. For a large sequencing
run, this might contain many samples, several experiments, and even
multiple experiment types.
- The resulting 10x FASTQ files have a specific structure.
- Read2 represents the mRNA sequence.
- Read1 represents the barcode and UMI
- Specialized software bins each distinct barcode into a putative cell
and aligns the mRNA sequence against a genome build.
- Alignments for features (genes) are quantified across all barcodes
to create a feature barcode matrix.
|
Common problems and challenges
Some droplets don’t work as intended
|
|
- Interpreting the sequencing outputs is simplest when each droplet
contains a single bead and a single healthy cell. The system is
optimized for this outcome and typically the majority of droplets will
follow this pattern. However, in each run there are always a few
complicating edge cases to consider.
- Sometimes a droplet contains a bead but no cell. This is actually
common but it’s impact is slight because in the absence of mRNA, the
enzymes won’t produce cDNA. In effect, the droplet appears empty and is
discarded.
- Sometimes two beads land in the same droplet with a cell. In theory,
the cells mRNA would appear to come from two droplets (i.e. two cells)
each with half the expected expression levels. In practice this rarely
happens because the microfluidics are tuned to avoid this.
- Sometimes two cells join with a single bead in a droplet creating a
doublet. This happens when some of the cells are not
fully dissociated from each other. In this case, the mRNAs from two
cells will receive the same label; the expression programs are merged
and the overall expected expression is roughly doubled.
- Sometimes the stress of the protocol induces cells to start
apoptosis. This confounds analysis because the expression programs
aren’t about the model biology but instead artifacts from the
experimental platform.
- Sometimes dissociated cells become so stressed they start to break
down in the suspension. When that happens, the mRNAs from the popped
cell’s combine together in the aqueous flow to create a soup of ambient
RNA. This appears as a droplet with extremely low expression.
|
Contrasting bulk RNA-seq with scRNA-Seq
Bulk and single-cell approaches are fundamentally complementary
approaches where bulk RNA-Seq provides a “forest-level” view while
scRNA-Seq shows the individual trees. Compared to bulk RNA-Seq,
scRNA-Seq provides powerful new perspectives. But it isn’t without
challenges or downsides.
Bulk vs. Single-Cell
|
|
- Single-cell is less mature than bulk.
- Single-cell sample prep is more complex than bulk.
- Single-cell typically sees only the subset of highly expressed
genes.
- Single cell analysis is typically more complex.
- Single cell analysis costs more than bulk analysis.
|
Summary
- scRNA-seq offers a powerful and nuanced approach to studying gene
expression at the cellular level. This technique can illuminate
biological mechanisms of healthy tissue or disease as well as extend our
understanding of cellular heterogeneity, responses to interventions, and
the cell state dynamics.
- scRNA-Seq experiments are typically more complex and often more
expensive than bulk RNA-Seq.
scRNA-Seq steps in summary
|
|
- A sample of tissue is extracted
- Tissue is dissociated into a clean suspension of healthy cells
- A reaction creates a cDNA molecule which combines a barcode label
with mRNA sequence.
- cDNAs are pooled together, library prep’ed and sequenced
- Computationally connect all the mRNAs back to a distinct cell of
origin
- Align the mRNA sequences to create a count matrix across all the
features and all the cells
- Bioinformatically separate the healthy cells from the experimental
artifacts
The last three steps are complex and also computationally demanding.
In 10x Genomics experiments, they are typically expedited by the tool
Cell Ranger which we cover in detail in the next
lesson.
|
