Wrapping up

We hope you now have more familiarity with key concepts, data types, tools, and how they all connect to enable single-cell gene expression analysis from RNA-Seq data.


Example methods

Single cell RNA sequencing using 10× genomics

The example dataset used in the workshop was inspired by this Sorkin et al. and the experimental methods below were excerpted directly from that paper:

  • Sorkin, Michael et al. “Regulation of heterotopic ossification by monocytes in a mouse model of aberrant wound healing.” Nature communications vol. 11,1 722. 5 Feb. 2020.
    https://pubmed.ncbi.nlm.nih.gov/32024825/

Tissues harvested from the extremity injury site were digested for 45 min in 0.3% Type 1 Collagenase and 0.4% Dispase II (Gibco) in Roswell Park Memorial Institute (RPMI) medium at 37 °C under constant agitation at 120 rpm. Digestions were subsequently quenched with 10% FBS RPMI and filtered through 40μm sterile strainers. Cells were then washed in PBS with 0.04% BSA, counted and resuspended at a concentration of ~1000 cells/μl. Cell viability was assessed with Trypan blue exclusion on a Countess II (Thermo Fisher Scientific) automated counter and only samples with >85% viability were processed for further sequencing.

University of Michigan Biomedical Research Core Facilities Advanced Genomics Core generated single-cell 3’ libraries on the 10× Genomics Chromium Controller following the manufacturers protocol for the v2 reagent kit (10× Genomics, Pleasanton, CA, USA). Cell suspensions were loaded onto a Chromium Single-Cell A chip along with reverse transcription (RT) master mix and single cell 3’ gel beads, aiming for 2000–6000 cells per channel. In this experiment, 8700 cells were encapsulated into emulsion droplets at a concentration of 700–1200 cells/ul which targets 5000 single cells with an expected multiplet rate of 3.9%. Following generation of single-cell gel bead-in-emulsions (GEMs), reverse transcription was performed and the resulting Post GEM-RT product was cleaned up using DynaBeads MyOne Silane beads (Thermo Fisher Scientific, Waltham, MA, USA). The cDNA was amplified, SPRIselect (Beckman Coulter, Brea, CA, USA) cleaned and quantified then enzymatically fragmented and size selected using SPRIselect beads to optimize the cDNA amplicon size prior to library construction. An additional round of double-sided SPRI bead cleanup is performed after end repair and A-tailing. Another single-sided cleanup is done after adapter ligation. Indexes were added during PCR amplification and a final double-sided SPRI cleanup was performed. Libraries were quantified by Kapa qPCR for Illumina Adapters (Roche) and size was determined by Agilent tapestation D1000 tapes. Read 1 primer sequence are added to the molecules during GEM incubation. P5, P7 and sample index and read 2 primer sequence are added during library construction via end repair, A-tailing, adaptor ligation and PCR. Libraries were generated with unique sample indices (SI) for each sample. Libraries were sequenced on a HiSeq 4000, (Illumina, San Diego, CA, USA) using a HiSeq 4000 PE Cluster Kit (PN PE-410-1001) with HiSeq 4000 SBS Kit (100 cycles, PN FC-410-1002) reagents, loaded at 200 pM following Illumina’s denaturing and dilution recommendations. The run configuration was 26 × 8 × 98 cycles for Read 1, Index and Read 2, respectively.
Data analysis

University of Michigan Biomedical Research Core Facilities Advanced Genomics Core executed 10x Genomics Cell Ranger (v7.2.0) to perform sample de-multiplexing, barcode processing, and single cell gene counting (Alignment, Barcoding and UMI Count). The Cell Ranger filtered barcode feature matrix was used as input to downstream analysis.

All analysis and graphics were generated in R (v4.4.1) [1]. Analysis was performed primarily using the Seurat package (v5.0.1) [2]. Cells with extreme values (which indicate low complexity, doublets, or apoptotic cells) were excluded by filtering to include only cells where Genes/cell >300 and % mitochondrial < 15% resulting in approximately 600-5,600 cells per sample after filtering. Counts were then normalized using the SCTransform method with default parameters [3].

Normalized data were integrated using the RPCA method (IntegrateLayers function with “SCT” as normalization method. Principal Component Analysis (PCA) was then performed and the first significant components were used for finding nearest neighbors followed by graph-based, semi-unsupervised Louvain clustering into distinct populations (resolution = 0.4). All uniform manifold approximation and projection (UMAP) plots were generated using default settings [4]. To identify marker genes, the clusters in the integrated data were compared pairwise for differential gene expression using Wilcoxon rank-sum test for single-cell gene expression (FindAllMarkers function; default parameters) [5]. Additional marker genes and cell-type predictions were generated with scCATCH (3.2.2) [6].

To identify differentially expressed (DE) genes within the total population, Case and Control samples were compared pairwise for differential expression expression (FindAllMarkers function; log2FC = 1.5; test.use = “Wilcoxon”). For each cluster, the results were further limited to significantly different genes (Benjamini-Hochberg adjusted p-value <= 0.05). Intra-cluster case-vs-control pseudo bulk comparisons were analyzed using DESeq2 (v1.44.0) [7].

References

  1. R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
  2. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, III WMM, Stoeckius M, Smibert P, Satija R (2018). “Comprehensive integration of single cell data.” bioRxiv. doi: 10.1101/460147, https://www.biorxiv.org/content/10.1101/460147v1
  3. Hafemeister, C., Satija, R (2019). Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20, 296. https://doi.org/10.1186/s13059-019-1874-1
  4. Becht, E. et al (2018). Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 .
  5. Myles Hollander and Douglas A. Wolfe (1973). Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 68–75.
  6. Shao et al (2020), scCATCH:Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data, iScience, Volume 23, Issue 3. doi: 10.1016/j.isci.2020.100882.
  7. Love MI, Huber W, Anders S (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15, 550. doi:10.1186/s13059-014-0550-8.
Session Info

Session info lists the relevant versions of R and all the libraries that were loaded in the analysis. This info is generally not included in main body of a paper, but serious reproducibility street-cred if it shows up in supplemental.
Hear me now, believe me later.

devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       Ubuntu 22.04.5 LTS
 system   x86_64, linux-gnu
 ui       RStudio
 language (EN)
 collate  C.UTF-8
 ctype    C.UTF-8
 tz       America/Detroit
 date     2024-10-21
 rstudio  2023.12.0+369 Ocean Storm (server)
 pandoc   3.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────
 package          * version    date (UTC) lib source
 abind              1.4-8      2024-09-12 [2] CRAN (R 4.4.1)
 assertthat         0.2.1      2019-03-21 [2] CRAN (R 4.4.1)
 BiocGenerics       0.50.0     2024-04-30 [2] Bioconductor 3.19 (R 4.4.1)
 BPCells          * 0.2.0      2024-07-18 [2] Github (bnprks/BPCells@5677cf1)
 cachem             1.1.0      2024-05-16 [2] CRAN (R 4.4.1)
 cli                3.6.3      2024-06-21 [2] CRAN (R 4.4.1)
 cluster            2.1.6      2023-12-01 [2] CRAN (R 4.4.1)
 codetools          0.2-20     2024-03-31 [2] CRAN (R 4.4.1)
 colorspace         2.1-1      2024-07-26 [2] CRAN (R 4.4.1)
 cowplot            1.1.3      2024-01-22 [2] CRAN (R 4.4.1)
 crayon             1.5.3      2024-06-20 [2] CRAN (R 4.4.1)
 data.table         1.16.0     2024-08-27 [2] CRAN (R 4.4.1)
 deldir             2.0-4      2024-02-28 [2] CRAN (R 4.4.1)
 devtools           2.4.5      2022-10-11 [2] CRAN (R 4.4.1)
 digest             0.6.37     2024-08-19 [2] CRAN (R 4.4.1)
 dotCall64          1.1-1      2023-11-28 [2] CRAN (R 4.4.1)
 dplyr            * 1.1.4      2023-11-17 [2] CRAN (R 4.4.1)
 ellipsis           0.3.2      2021-04-29 [2] CRAN (R 4.4.1)
 evaluate           0.24.0     2024-06-10 [2] CRAN (R 4.4.1)
 fansi              1.0.6      2023-12-08 [2] CRAN (R 4.4.1)
 farver             2.1.2      2024-05-13 [2] CRAN (R 4.4.1)
 fastDummies        1.7.4      2024-08-16 [2] CRAN (R 4.4.1)
 fastmap            1.2.0      2024-05-15 [2] CRAN (R 4.4.1)
 fitdistrplus       1.2-1      2024-07-12 [2] CRAN (R 4.4.1)
 forcats          * 1.0.0      2023-01-29 [2] CRAN (R 4.4.1)
 fs                 1.6.4      2024-04-25 [2] CRAN (R 4.4.1)
 future             1.34.0     2024-07-29 [2] CRAN (R 4.4.1)
 future.apply       1.11.2     2024-03-28 [2] CRAN (R 4.4.1)
 generics           0.1.3      2022-07-05 [2] CRAN (R 4.4.1)
 GenomeInfoDb       1.40.1     2024-05-24 [2] Bioconductor 3.19 (R 4.4.1)
 GenomeInfoDbData   1.2.12     2024-07-17 [2] Bioconductor
 GenomicRanges      1.56.1     2024-06-12 [2] Bioconductor 3.19 (R 4.4.1)
 ggplot2          * 3.5.1      2024-04-23 [2] CRAN (R 4.4.1)
 ggrepel            0.9.6      2024-09-07 [2] CRAN (R 4.4.1)
 ggridges           0.5.6      2024-01-23 [2] CRAN (R 4.4.1)
 globals            0.16.3     2024-03-08 [2] CRAN (R 4.4.1)
 glue               1.7.0      2024-01-09 [2] CRAN (R 4.4.1)
 goftest            1.2-3      2021-10-07 [2] CRAN (R 4.4.1)
 gridExtra          2.3        2017-09-09 [2] CRAN (R 4.4.1)
 gtable             0.3.5      2024-04-22 [2] CRAN (R 4.4.1)
 hms                1.1.3      2023-03-21 [2] CRAN (R 4.4.1)
 htmltools          0.5.8.1    2024-04-04 [2] CRAN (R 4.4.1)
 htmlwidgets        1.6.4      2023-12-06 [2] CRAN (R 4.4.1)
 httpuv             1.6.15     2024-03-26 [2] CRAN (R 4.4.1)
 httr               1.4.7      2023-08-15 [2] CRAN (R 4.4.1)
 ica                1.0-3      2022-07-08 [2] CRAN (R 4.4.1)
 igraph             2.0.3      2024-03-13 [2] CRAN (R 4.4.1)
 IRanges            2.38.1     2024-07-03 [2] Bioconductor 3.19 (R 4.4.1)
 irlba              2.3.5.1    2022-10-03 [2] CRAN (R 4.4.1)
 jsonlite           1.8.8      2023-12-04 [2] CRAN (R 4.4.1)
 KernSmooth         2.23-24    2024-05-17 [2] CRAN (R 4.4.1)
 klippy             0.0.0.9500 2024-10-02 [1] Github (umich-brcf-bioinf/workshop-klippy@a1be090)
 knitr              1.48       2024-07-07 [2] CRAN (R 4.4.1)
 later              1.3.2      2023-12-06 [2] CRAN (R 4.4.1)
 lattice            0.22-6     2024-03-20 [2] CRAN (R 4.4.1)
 lazyeval           0.2.2      2019-03-15 [2] CRAN (R 4.4.1)
 leiden             0.4.3.1    2023-11-17 [2] CRAN (R 4.4.1)
 lifecycle          1.0.4      2023-11-07 [2] CRAN (R 4.4.1)
 listenv            0.9.1      2024-01-29 [2] CRAN (R 4.4.1)
 lmtest             0.9-40     2022-03-21 [2] CRAN (R 4.4.1)
 lubridate        * 1.9.3      2023-09-27 [2] CRAN (R 4.4.1)
 magrittr           2.0.3      2022-03-30 [2] CRAN (R 4.4.1)
 MASS               7.3-61     2024-06-13 [2] CRAN (R 4.4.1)
 Matrix             1.7-0      2024-04-26 [2] CRAN (R 4.4.1)
 MatrixGenerics     1.16.0     2024-04-30 [2] Bioconductor 3.19 (R 4.4.1)
 matrixStats        1.4.1      2024-09-08 [2] CRAN (R 4.4.1)
 memoise            2.0.1      2021-11-26 [2] CRAN (R 4.4.1)
 mime               0.12       2021-09-28 [2] CRAN (R 4.4.1)
 miniUI             0.1.1.1    2018-05-18 [2] CRAN (R 4.4.1)
 munsell            0.5.1      2024-04-01 [2] CRAN (R 4.4.1)
 nlme               3.1-166    2024-08-14 [2] CRAN (R 4.4.1)
 parallelly         1.38.0     2024-07-27 [2] CRAN (R 4.4.1)
 patchwork          1.3.0      2024-09-16 [2] CRAN (R 4.4.1)
 pbapply            1.7-2      2023-06-27 [2] CRAN (R 4.4.1)
 pillar             1.9.0      2023-03-22 [2] CRAN (R 4.4.1)
 pkgbuild           1.4.4      2024-03-17 [2] CRAN (R 4.4.1)
 pkgconfig          2.0.3      2019-09-22 [2] CRAN (R 4.4.1)
 pkgload            1.4.0      2024-06-28 [2] CRAN (R 4.4.1)
 plotly             4.10.4     2024-01-13 [2] CRAN (R 4.4.1)
 plyr               1.8.9      2023-10-02 [2] CRAN (R 4.4.1)
 png                0.1-8      2022-11-29 [2] CRAN (R 4.4.1)
 polyclip           1.10-7     2024-07-23 [2] CRAN (R 4.4.1)
 prettyunits        1.2.0      2023-09-24 [2] CRAN (R 4.4.1)
 profvis            0.3.8      2023-05-02 [2] CRAN (R 4.4.1)
 progress           1.2.3      2023-12-06 [2] CRAN (R 4.4.1)
 progressr          0.14.0     2023-08-10 [2] CRAN (R 4.4.1)
 promises           1.3.0      2024-04-05 [2] CRAN (R 4.4.1)
 purrr            * 1.0.2      2023-08-10 [2] CRAN (R 4.4.1)
 R6                 2.5.1      2021-08-19 [2] CRAN (R 4.4.1)
 RANN               2.6.2      2024-08-25 [2] CRAN (R 4.4.1)
 RColorBrewer       1.1-3      2022-04-03 [2] CRAN (R 4.4.1)
 Rcpp               1.0.13     2024-07-17 [2] CRAN (R 4.4.1)
 RcppAnnoy          0.0.22     2024-01-23 [2] CRAN (R 4.4.1)
 RcppHNSW           0.6.0      2024-02-04 [2] CRAN (R 4.4.1)
 readr            * 2.1.5      2024-01-10 [2] CRAN (R 4.4.1)
 remotes            2.5.0      2024-03-17 [2] CRAN (R 4.4.1)
 reshape2           1.4.4      2020-04-09 [2] CRAN (R 4.4.1)
 reticulate         1.39.0     2024-09-05 [2] CRAN (R 4.4.1)
 rlang              1.1.4      2024-06-04 [2] CRAN (R 4.4.1)
 rmarkdown          2.28       2024-08-17 [2] CRAN (R 4.4.1)
 ROCR               1.0-11     2020-05-02 [2] CRAN (R 4.4.1)
 RSpectra           0.16-2     2024-07-18 [2] CRAN (R 4.4.1)
 rstudioapi         0.16.0     2024-03-24 [2] CRAN (R 4.4.1)
 Rtsne              0.17       2023-12-07 [2] CRAN (R 4.4.1)
 S4Vectors          0.42.1     2024-07-03 [2] Bioconductor 3.19 (R 4.4.1)
 scales             1.3.0      2023-11-28 [2] CRAN (R 4.4.1)
 scattermore        1.2        2023-06-12 [2] CRAN (R 4.4.1)
 scCATCH          * 3.2.2      2023-04-23 [2] CRAN (R 4.4.1)
 sctransform        0.4.1      2023-10-19 [2] CRAN (R 4.4.1)
 sessioninfo        1.2.2      2021-12-06 [2] CRAN (R 4.4.1)
 Seurat           * 5.1.0      2024-05-10 [2] CRAN (R 4.4.1)
 SeuratObject     * 5.0.2      2024-05-08 [2] CRAN (R 4.4.1)
 shiny              1.9.1      2024-08-01 [2] CRAN (R 4.4.1)
 sp               * 2.1-4      2024-04-30 [2] CRAN (R 4.4.1)
 spam               2.10-0     2023-10-23 [2] CRAN (R 4.4.1)
 spatstat.data      3.1-2      2024-06-21 [2] CRAN (R 4.4.1)
 spatstat.explore   3.3-2      2024-08-21 [2] CRAN (R 4.4.1)
 spatstat.geom      3.3-2      2024-07-15 [2] CRAN (R 4.4.1)
 spatstat.random    3.3-1      2024-07-15 [2] CRAN (R 4.4.1)
 spatstat.sparse    3.1-0      2024-06-21 [2] CRAN (R 4.4.1)
 spatstat.univar    3.0-1      2024-09-05 [2] CRAN (R 4.4.1)
 spatstat.utils     3.1-0      2024-08-17 [2] CRAN (R 4.4.1)
 stringi            1.8.4      2024-05-06 [2] CRAN (R 4.4.1)
 stringr          * 1.5.1      2023-11-14 [2] CRAN (R 4.4.1)
 survival           3.7-0      2024-06-05 [2] CRAN (R 4.4.1)
 tensor             1.5        2012-05-05 [2] CRAN (R 4.4.1)
 tibble           * 3.2.1      2023-03-20 [2] CRAN (R 4.4.1)
 tidyr            * 1.3.1      2024-01-24 [2] CRAN (R 4.4.1)
 tidyselect         1.2.1      2024-03-11 [2] CRAN (R 4.4.1)
 tidyverse        * 2.0.0      2023-02-22 [2] CRAN (R 4.4.1)
 timechange         0.3.0      2024-01-18 [2] CRAN (R 4.4.1)
 tzdb               0.4.0      2023-05-12 [2] CRAN (R 4.4.1)
 UCSC.utils         1.0.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.1)
 urlchecker         1.0.1      2021-11-30 [2] CRAN (R 4.4.1)
 usethis            3.0.0      2024-07-29 [2] CRAN (R 4.4.1)
 utf8               1.2.4      2023-10-22 [2] CRAN (R 4.4.1)
 uwot               0.2.2      2024-04-21 [2] CRAN (R 4.4.1)
 vctrs              0.6.5      2023-12-01 [2] CRAN (R 4.4.1)
 viridisLite        0.4.2      2023-05-02 [2] CRAN (R 4.4.1)
 withr              3.0.1      2024-07-31 [2] CRAN (R 4.4.1)
 xfun               0.47       2024-08-17 [2] CRAN (R 4.4.1)
 xtable             1.8-4      2019-04-21 [2] CRAN (R 4.4.1)
 XVector            0.44.0     2024-04-30 [2] Bioconductor 3.19 (R 4.4.1)
 yaml               2.3.10     2024-07-26 [2] CRAN (R 4.4.1)
 zlibbioc           1.50.0     2024-04-30 [2] Bioconductor 3.19 (R 4.4.1)
 zoo                1.8-12     2023-04-13 [2] CRAN (R 4.4.1)

Housekeeping


Looking ahead

Workshop environment

  • RStudio workshop compute environment will be available until 10/22/2024.

    • Please save all your R scripts now so that we can “right-size” the compute environment immediately following today’s workshop session.
  • You can download files from the workshop environment from your terminal/command line window as below. (You will need to substitute your actual workshop username and type workshop password when prompted.)

    # download workshop files -------------------------------------------------
    mkdir intro_scrnaseq-workshop
    cd intro_scrnaseq-workshop
    scp -r YOUR_USERNAME@bfx-workshop01.med.umich.edu:"ISC_R*" .
    • Note that the full download of the R data is about 8Gb, so depending on your internet speeds it could take a while. (We do not recommend you download the full set of Cell Ranger outputs.)

Installing software locally

  • You can install necessary programs to run programs locally. Note that for typical data, Cell Ranger steps (reviewed Day 1) assume your computer has powerful compute (many CPUs and lots of RAM) and sizable storage capacity. (i.e. it’s impractical to run these on your laptop.)
  • Installing bioinformatics software is non-trivial and comprehensive instructions to setup a complete compute environment are outside the scope of this workshop. (For University of Michigan learners, we are planning to host a Computational Reproducibility workshop later this year that would cover this installation and other related tasks in more detail.) For the intrepid, see relevant links below:
University of Michigan Resources
  • UM CoderSpaces “office hours” and UM CoderSpaces Slack workspace. (See “Useful Resources” section of the CoderSpaces page for instructions on how to join the CoderSpaces Slack workspace.)
  • Upcoming UM Advanced Research Computing workshops.
  • Advanced Research Computing (ARC) at University of Michigan hosts a high-performance computing (HPC) platform called Great Lakes which combines high-end computers, fast/resilient storage, and pre-installed software. Great Lakes may be a good resource for folks who need to run the more compute intensive steps and a substantial block of compute and storage is subsidized by ARC making it essentially free to many UM researchers.

Resources for continued learning


Thank you to our sponsors


Thank you to/from the workshop team

Chris Marci Raymond Dana
Travis Olivia Ram
Matt Joe Nick


Thank you for participating in our workshop. We welcome your questions and feedback now and in the future.

Bioinformatics Workshop Team

bioinformatics-workshops@umich.edu
UM BRCF Bioinformatics Core