Integrating single-cell transcriptomic data across different conditions, technologies,
      and species

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple datasets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq datasets based on common sources of variation, enabling the identification of shared populations across datasets and downstream comparative analysis. Implemented in our R toolkit Seurat ( http://satijalab.org/seurat/), we use our approach to align scRNA-seq datasets of peripheral blood monocytes (PBMCs) under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell ‘atlases’ generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across datasets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq datasets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

Related collections

Most cited references 26

Record: found
Abstract: found
Article: not found

Comparative Analysis of Single-Cell RNA Sequencing Methods.

Christoph Ziegenhain, Beate Vieth, Swati Parekh … (2017)

Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols.

0 comments Cited 567 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure.

Maayan Baron, Adrian Veres, Samuel Wolock … (2016)

Although the function of the mammalian pancreas hinges on complex interactions of distinct cell types, gene expression profiles have primarily been described with bulk mixtures. Here we implemented a droplet-based, single-cell RNA-seq method to determine the transcriptomes of over 12,000 individual pancreatic cells from four human donors and two mouse strains. Cells could be divided into 15 clusters that matched previously characterized cell types: all endocrine cell types, including rare epsilon-cells; exocrine cell types; vascular cells; Schwann cells; quiescent and activated stellate cells; and four types of immune cells. We detected subpopulations of ductal cells with distinct expression profiles and validated their existence with immuno-histochemistry stains. Moreover, among human beta- cells, we detected heterogeneity in the regulation of genes relating to functional maturation and levels of ER stress. Finally, we deconvolved bulk gene expression samples using the single-cell data to detect disease-associated differential expression. Our dataset provides a resource for the discovery of novel cell type-specific transcription factors, signaling receptors, and medically relevant genes.

0 comments Cited 562 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

Daniela M. Witten, Robert Tibshirani, Trevor J. Hastie (2009)

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.

0 comments Cited 450 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9604648

Journal ID (pubmed-jr-id): 20305

Journal ID (nlm-ta): Nat Biotechnol

Journal ID (iso-abbrev): Nat. Biotechnol.

Title: Nature biotechnology

ISSN (Print): 1087-0156

ISSN (Electronic): 1546-1696

Publication date Nihms-submitted: 27 September 2018

Publication date (Electronic): 02 April 2018

Publication date (Print): June 2018

Publication date PMC-release: 20 August 2019

Volume: 36

Issue: 5

Pages: 411-420

Affiliations

[1 ]New York Genome Center, New York, NY 10013, USA

[2 ]Center for Genomics and Systems Biology, New York University, New York, NY 10003-6688, USA

Author notes

AUTHOR CONTRIBUTIONS

AB and RS conceived the research. AB, PH, and RS implemented the alignment procedure, performed all data analysis, and wrote the manuscript. EP performed the PBMC validation experiments, and PS performed the ddSeq experiments.

[# ]To whom correspondence should be addressed: rsatija@ 123456nygenome.org

Article

Accession ID: PMC6700744 Pmcid ID: PMC6700744 Pmc-uid ID: 6700744 Manuscript ID: nihpa990262

DOI: 10.1038/nbt.4096

PMC ID: 6700744

PubMed ID: 29608179

SO-VID: 00018093-161a-45d3-854a-38a5594ccd1f

History

Comments

Comment on this article

scite_

Cited by 4,458

See all cited by

Most referenced authors 2,828

See all reference authors

- Version 1
- Version 1

Integrating single-cell transcriptomic data across different conditions, technologies, and species

Read this article at

Abstract

Related collections

Nanopublications (single, attributable and machine-readable assertions in scientific literature)

Most cited references 26

Comparative Analysis of Single-Cell RNA Sequencing Methods.

A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure.

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 170

Cited by 4,458

Most referenced authors 2,828