7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.

          Electronic supplementary material

          The online version of this article (10.1186/s13059-018-1406-4) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: not found

          Nature, nurture, or chance: stochastic gene expression and its consequences.

          Gene expression is a fundamentally stochastic process, with randomness in transcription and translation leading to cell-to-cell variations in mRNA and protein levels. This variation appears in organisms ranging from microbes to metazoans, and its characteristics depend both on the biophysical parameters governing gene expression and on gene network structure. Stochastic gene expression has important consequences for cellular function, being beneficial in some contexts and harmful in others. These situations include the stress response, metabolism, development, the cell cycle, circadian rhythms, and aging.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Quantitative single-cell RNA-seq with unique molecular identifiers.

            Single-cell RNA sequencing (RNA-seq) is a powerful tool to reveal cellular heterogeneity, discover new cell types and characterize tumor microevolution. However, losses in cDNA synthesis and bias in cDNA amplification lead to severe quantitative errors. We show that molecular labels--random sequences that label individual molecules--can nearly eliminate amplification noise, and that microfluidic sample preparation and optimized reagents produce a fivefold improvement in mRNA capture efficiency.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

              Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.
                Bookmark

                Author and article information

                Contributors
                charlotte.soneson@uzh.ch
                milove@email.unc.edu
                dar2062@med.cornell.edu
                jean-philippe.vert@curie.fr
                mark.robinson@imls.uzh.ch
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                26 February 2018
                26 February 2018
                2018
                : 19
                Affiliations
                [1 ]ISNI 0000 0001 2069 7798, GRID grid.5342.0, Department of Applied Mathematics, Computer Science and Statistics, , Ghent University, ; Krijgslaan 281, S9, Ghent, 9000 Belgium
                [2 ]ISNI 0000 0001 2069 7798, GRID grid.5342.0, Bioinformatics Institute Ghent, Ghent University, ; Ghent, 9000 Belgium
                [3 ]ISNI 0000 0001 2181 7878, GRID grid.47840.3f, Division of Biostatistics, , School of Public Health, University of California, ; Berkeley, USA
                [4 ]ISNI 0000 0004 1937 0650, GRID grid.7400.3, Institute of Molecular Life Sciences, University of Zurich, ; Winterthurerstrasse 190, Zurich, 8057 Switzerland
                [5 ]ISNI 0000 0004 1937 0650, GRID grid.7400.3, SIB Swiss Institute of Bioinformatics, University of Zurich, ; Zurich, 8057 Switzerland
                [6 ]ISNI 0000000122483208, GRID grid.10698.36, Department of Biostatistics and Genetics, , The University of North Carolina at Chapel Hill, ; Chapel Hill, NC USA
                [7 ]ISNI 000000041936877X, GRID grid.5386.8, Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research, , Weill Cornell Medicine, ; New York, USA
                [8 ]ISNI 0000 0001 2097 6957, GRID grid.58140.38, MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, ; Paris, France
                [9 ]ISNI 0000 0004 0639 6384, GRID grid.418596.7, Institut Curie, ; Paris, France
                [10 ]INSERM U900, Paris, France
                [11 ]ISNI 0000000121105547, GRID grid.5607.4, Ecole Normale Supérieure, Department of Mathematics and Applications, ; Paris, France
                [12 ]ISNI 0000 0001 2181 7878, GRID grid.47840.3f, Department of Statistics, , University of California, ; Berkeley, USA
                Article
                1406
                10.1186/s13059-018-1406-4
                6251479
                29478411
                4030f4fb-d426-4024-877d-22c94432722d
                © The Author(s) 2018

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                Funding
                Funded by: IAP StUDyS grant
                Award ID: P7/06
                Funded by: MRP N2N
                Funded by: FundRef http://dx.doi.org/10.13039/501100003130, Fonds Wetenschappelijk Onderzoek;
                Award ID: 1S 418 16N
                Funded by: Forschungskredit
                Award ID: FK-16-107
                Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: CA142538-08
                Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: U01 MH105979
                Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: U01 MH105979
                Funded by: FundRef http://dx.doi.org/10.13039/501100001665, Agence Nationale de la Recherche;
                Award ID: ABS4NGS ANR-11-BINF-0001
                Funded by: FundRef http://dx.doi.org/10.13039/501100000781, European Research Council;
                Award ID: ERC-SMAC-290032
                Funded by: FundRef http://dx.doi.org/10.13039/100007247, Adolph C. and Mary Sprague Miller Institute for Basic Research in Science, University of California Berkeley;
                Funded by: Fulbright Foundation
                Categories
                Method
                Custom metadata
                © The Author(s) 2018

                Comments

                Comment on this article