Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.

Electronic supplementary material

The online version of this article (10.1186/s13059-018-1406-4) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 20

Record: found
Abstract: found
Article: not found

Nature, nurture, or chance: stochastic gene expression and its consequences.

Arjun Raj, Alexander van Oudenaarden (2008)

Gene expression is a fundamentally stochastic process, with randomness in transcription and translation leading to cell-to-cell variations in mRNA and protein levels. This variation appears in organisms ranging from microbes to metazoans, and its characteristics depend both on the biophysical parameters governing gene expression and on gene network structure. Stochastic gene expression has important consequences for cellular function, being beneficial in some contexts and harmful in others. These situations include the stress response, metabolism, development, the cell cycle, circadian rhythms, and aging.

0 comments Cited 949 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

Aaron Lun, Davis J. McCarthy, John Marioni … (2016)

Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

0 comments Cited 618 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Quantitative single-cell RNA-seq with unique molecular identifiers.

Saiful Islam, Amit Zeisel, Simon Joost … (2014)

Single-cell RNA sequencing (RNA-seq) is a powerful tool to reveal cellular heterogeneity, discover new cell types and characterize tumor microevolution. However, losses in cDNA synthesis and bias in cDNA amplification lead to severe quantitative errors. We show that molecular labels--random sequences that label individual molecules--can nearly eliminate amplification noise, and that microfluidic sample preparation and optimized reagents produce a fivefold improvement in mRNA capture efficiency.

0 comments Cited 510 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Charlotte Soneson: charlotte.soneson@uzh.ch

Michael I. Love: milove@email.unc.edu

Davide Risso: dar2062@med.cornell.edu

Jean-Philippe Vert: jean-philippe.vert@curie.fr

Mark D. Robinson: mark.robinson@imls.uzh.ch

Journal

Journal ID (nlm-ta): Genome Biol

Journal ID (iso-abbrev): Genome Biol

Title: Genome Biology

Publisher: BioMed Central (London )

ISSN (Print): 1474-7596

ISSN (Electronic): 1474-760X

Publication date (Electronic): 26 February 2018

Publication date PMC-release: 26 February 2018

Publication date Collection: 2018

Volume: 19

Electronic Location Identifier: 24

Affiliations

[1 ]ISNI 0000 0001 2069 7798, GRID grid.5342.0, Department of Applied Mathematics, Computer Science and Statistics, , Ghent University, ; Krijgslaan 281, S9, Ghent, 9000 Belgium

[2 ]ISNI 0000 0001 2069 7798, GRID grid.5342.0, Bioinformatics Institute Ghent, Ghent University, ; Ghent, 9000 Belgium

[3 ]ISNI 0000 0001 2181 7878, GRID grid.47840.3f, Division of Biostatistics, , School of Public Health, University of California, ; Berkeley, USA

[4 ]ISNI 0000 0004 1937 0650, GRID grid.7400.3, Institute of Molecular Life Sciences, University of Zurich, ; Winterthurerstrasse 190, Zurich, 8057 Switzerland

[5 ]ISNI 0000 0004 1937 0650, GRID grid.7400.3, SIB Swiss Institute of Bioinformatics, University of Zurich, ; Zurich, 8057 Switzerland

[6 ]ISNI 0000000122483208, GRID grid.10698.36, Department of Biostatistics and Genetics, , The University of North Carolina at Chapel Hill, ; Chapel Hill, NC USA

[7 ]ISNI 000000041936877X, GRID grid.5386.8, Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research, , Weill Cornell Medicine, ; New York, USA

[8 ]ISNI 0000 0001 2097 6957, GRID grid.58140.38, MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, ; Paris, France

[9 ]ISNI 0000 0004 0639 6384, GRID grid.418596.7, Institut Curie, ; Paris, France

[10 ]INSERM U900, Paris, France

[11 ]ISNI 0000000121105547, GRID grid.5607.4, Ecole Normale Supérieure, Department of Mathematics and Applications, ; Paris, France

[12 ]ISNI 0000 0001 2181 7878, GRID grid.47840.3f, Department of Statistics, , University of California, ; Berkeley, USA

Author information

Lieven Clement http://orcid.org/0000-0002-1833-8478

Article

Publisher ID: 1406

DOI: 10.1186/s13059-018-1406-4

PMC ID: 6251479

PubMed ID: 29478411

SO-VID: 4030f4fb-d426-4024-877d-22c94432722d

License:

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 23 December 2017

Date accepted : 7 February 2018

Funding

Funded by: IAP StUDyS grant

Award ID: P7/06

Funded by: MRP N2N

Funded by: FundRef http://dx.doi.org/10.13039/501100003130, Fonds Wetenschappelijk Onderzoek;

Award ID: 1S 418 16N

Funded by: Forschungskredit

Award ID: FK-16-107

Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;

Award ID: CA142538-08

Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;

Award ID: U01 MH105979

Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;

Award ID: U01 MH105979

Funded by: FundRef http://dx.doi.org/10.13039/501100001665, Agence Nationale de la Recherche;

Award ID: ABS4NGS ANR-11-BINF-0001

Funded by: FundRef http://dx.doi.org/10.13039/501100000781, European Research Council;

Award ID: ERC-SMAC-290032

Funded by: FundRef http://dx.doi.org/10.13039/100007247, Adolph C. and Mary Sprague Miller Institute for Basic Research in Science, University of California Berkeley;

Funded by: Fulbright Foundation

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: single-cell rna sequencing,differential expression,zero-inflated negative binomial,weights

Data availability:

ScienceOpen disciplines: Genetics

Keywords: single-cell rna sequencing, differential expression, zero-inflated negative binomial, weights

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

Read this article at

Abstract

Electronic supplementary material

Related collections

RNA drug delivery

Most cited references 20

Nature, nurture, or chance: stochastic gene expression and its consequences.

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

Quantitative single-cell RNA-seq with unique molecular identifiers.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 359

Cited by 80

Most referenced authors 2