0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A data-efficient deep learning tool for scRNA-Seq label transfer in neuroscience

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Large single-cell RNA datasets have contributed to unprecedented biological insight. Often, these take the form of cell atlases and serve as a reference for automating cell labeling of newly sequenced samples. Yet, classification algorithms have lacked the capacity to accurately annotate cells, particularly in complex datasets. Here we present SIMS (Scalable, Interpretable Machine Learning for Single-Cell), an end-to-end data-efficient machine learning pipeline for discrete classification of single-cell data that can be applied to new datasets with minimal coding. We benchmarked SIMS against common single-cell label transfer tools and demonstrated that it performs as well or better than state of the art algorithms. We then use SIMS to classify cells in one of the most complex tissues: the brain. We show that SIMS classifies cells of the adult cerebral cortex and hippocampus at a remarkably higher accuracy than state-of-the-art single cell classifiers. This accuracy is maintained in trans-sample label transfers of the adult human cerebral cortex. We then apply SIMS to classify cells in the developing brain and demonstrate a high level of accuracy at predicting neuronal subtypes, even in periods of fate refinement. Finally, we apply SIMS to single cell datasets of cortical organoids to predict cell identities in previously unclassified cells and to uncover genetic variations in the developmental trajectories of organoids derived from different pluripotent stem cell lines. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.

          Related collections

          Most cited references76

          • Record: found
          • Abstract: found
          • Article: not found

          Comprehensive Integration of Single-Cell Data

          Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Integrated analysis of multimodal single-cell data

            Summary The simultaneous measurement of multiple modalities represents an exciting frontier for single-cell genomics and necessitates computational methods that can define cellular states based on multimodal data. Here, we introduce “weighted-nearest neighbor” analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of 211,000 human peripheral blood mononuclear cells (PBMCs) with panels extending to 228 antibodies to construct a multimodal reference atlas of the circulating immune system. Multimodal analysis substantially improves our ability to resolve cell states, allowing us to identify and validate previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets and to interpret immune responses to vaccination and coronavirus disease 2019 (COVID-19). Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets and to look beyond the transcriptome toward a unified and multimodal definition of cellular identity.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Massively parallel digital transcriptional profiling of single cells

              Characterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. We describe a droplet-based system that enables 3′ mRNA counting of tens of thousands of single cells per sample. Cell encapsulation, of up to 8 samples at a time, takes place in ∼6 min, with ∼50% cell capture efficiency. To demonstrate the system's technical performance, we collected transcriptome data from ∼250k single cells across 29 samples. We validated the sensitivity of the system and its ability to detect rare populations using cell lines and synthetic RNAs. We profiled 68k peripheral blood mononuclear cells to demonstrate the system's ability to characterize large immune populations. Finally, we used sequence variation in the transcriptome data to determine host and donor chimerism at single-cell resolution from bone marrow mononuclear cells isolated from transplant patients.
                Bookmark

                Author and article information

                Journal
                bioRxiv
                BIORXIV
                bioRxiv
                Cold Spring Harbor Laboratory
                01 March 2023
                : 2023.02.28.529615
                Affiliations
                [1 ]Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95060, USA.
                [2 ]Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, CA, 95060, USA.
                [3 ]Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, CA, 95060, USA.
                [4 ]Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, 95060, USA.
                [5 ]Department of Electrical and Computer Engineering, University of California Santa Cruz, Santa Cruz, CA, 95060, USA.
                Author notes
                [*]

                These authors contributed equally to this work.

                [+]

                Co-senior authors.

                5.1 Author Contribution Statement

                D.H., M.T., V.D.J., and M.A.M.-R. conceived the project. J.L. and J.G.-F. performed the experiments. J.L., J.G.-F. and M.A.M-R. wrote the paper with contributions from all authors.

                [- ]Correspondence to vjonsson@ 123456ucsc.edu (V.D.J.) and mmostajo@ 123456ucsc.edu (M.A.M.-R.)
                Article
                10.1101/2023.02.28.529615
                10002667
                36909548
                c69aeb60-44bb-4c94-9b5c-ae92ba459c2f

                This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

                History
                Categories
                Article

                Comments

                Comment on this article