Fast, sensitive, and accurate integration of single cell data with Harmony

Korsunsky, Ilya; Millard, Nghia; Fan, Jean; Slowikowski, Kamil; Zhang, Fan; Wei, Kevin Y; Baglaenko, Yuriy; Brenner, Michael P.; Loh, Po-ru; Raychaudhuri, Soumya

doi:10.1038/s41592-019-0619-0

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

Fast, sensitive, and accurate integration of single cell data with Harmony

research-article

Author(s): Ilya Korsunsky ¹ ^, ² ^, ³ ^, ⁴ , Nghia Millard ¹ ^, ² ^, ³ ^, ⁴ , Jean Fan ⁵ , Kamil Slowikowski ¹ ^, ² ^, ³ ^, ⁴ , Fan Zhang ¹ ^, ² ^, ³ ^, ⁴ , Kevin Wei ² , Yuriy Baglaenko ¹ ^, ² ^, ³ ^, ⁴ , Michael Brenner ² , Po-ru Loh ¹ ^, ³ ^, ⁴ , Soumya Raychaudhuri ¹ ^, ² ^, ³ ^, ⁴ ^, ⁶

Publication date (Electronic): 18 November 2019

Journal: Nature methods

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The emerging diversity of single cell RNAseq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies. Here, real biological differences are interspersed with technical differences. We present Harmony, an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms. We show that Harmony requires dramatically fewer computational resources. It is the only currently available algorithm that makes the integration of ~10 ⁶ cells feasible on a personal computer. We apply Harmony to PBMCs from datasets with large experimental differences, 5 studies of pancreatic islet cells, mouse embryogenesis datasets, and cross-modality spatial integration.

Related collections

Most cited references 21

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15331 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

Aaron Lun, Davis J. McCarthy, John Marioni … (2016)

Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

0 comments Cited 627 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure.

Maayan Baron, Adrian Veres, Samuel Wolock … (2016)

Although the function of the mammalian pancreas hinges on complex interactions of distinct cell types, gene expression profiles have primarily been described with bulk mixtures. Here we implemented a droplet-based, single-cell RNA-seq method to determine the transcriptomes of over 12,000 individual pancreatic cells from four human donors and two mouse strains. Cells could be divided into 15 clusters that matched previously characterized cell types: all endocrine cell types, including rare epsilon-cells; exocrine cell types; vascular cells; Schwann cells; quiescent and activated stellate cells; and four types of immune cells. We detected subpopulations of ductal cells with distinct expression profiles and validated their existence with immuno-histochemistry stains. Moreover, among human beta- cells, we detected heterogeneity in the regulation of genes relating to functional maturation and levels of ER stress. Finally, we deconvolved bulk gene expression samples using the single-cell data to detect disease-associated differential expression. Our dataset provides a resource for the discovery of novel cell type-specific transcription factors, signaling receptors, and medically relevant genes.

0 comments Cited 563 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 101215604

Journal ID (pubmed-jr-id): 32338

Journal ID (nlm-ta): Nat Methods

Journal ID (iso-abbrev): Nat. Methods

Title: Nature methods

ISSN (Print): 1548-7091

ISSN (Electronic): 1548-7105

Publication date Nihms-submitted: 7 September 2019

Publication date (Electronic): 18 November 2019

Publication date (Print): December 2019

Publication date PMC-release: 18 May 2020

Volume: 16

Issue: 12

Pages: 1289-1296

Affiliations

[1 ]Center for Data Sciences, Brigham and Women's Hospital, Massachusetts, USA.

[2 ]Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston.

[3 ]Department of Biomedical Informatics, Harvard Medical School, Massachusetts, USA.

[4 ]Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

[5 ]Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA.

[6 ]Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK.

Author notes

[* ]Correspondence to: Soumya Raychaudhuri, 77 Avenue Louis Pasteur, Harvard New Research Building, Suite 250D, Boston, MA 02446, USA. soumya@ 123456broadinstitute.org ; 617-525-4484 (tel); 617-525-4488 (fax)

Author Contributions

SR and IK conceived the research. IK led computational work under the guidance of SR, assisted by NM, PL, JF, and KS. All authors participated in interpretation and writing the manuscript.

Article

Manuscript ID: NIHMS1539299

DOI: 10.1038/s41592-019-0619-0

PMC ID: 6884693

PubMed ID: 31740819

SO-VID: e722cc28-b251-4a76-8112-343bc413c85c

License:

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

Fast, sensitive, and accurate integration of single cell data with Harmony

Read this article at

Abstract

Related collections

BIO Integration

Most cited references 21

Gene Ontology: tool for the unification of biology

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 344

Cited by 2,041

Most referenced authors 2,479