12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Normalization of single-cell RNA-sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability, high amounts of missing observations and batch effect typical of scRNA-seq datasets make this task particularly challenging. There is a need for an efficient and unified approach for normalization, imputation and batch effect correction.

          Results

          Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We first validate our assumptions by showing this model can reproduce different statistics observed in real scRNA-seq data. We demonstrate using publicly available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule fluorescence in situ hybridization measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared with other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalization, imputation and true count recovery of gene expression measurements from scRNA-seq data.

          Availability and implementation

          The R package ‘bayNorm’ is publishd on bioconductor at https://bioconductor.org/packages/release/bioc/html/bayNorm.html. The code for analyzing data in this article is available at https://github.com/WT215/bayNorm_papercode.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references58

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

          In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Integrating single-cell transcriptomic data across different conditions, technologies, and species

              Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 February 2020
                04 October 2019
                04 October 2019
                : 36
                : 4
                : 1174-1181
                Affiliations
                [1 ] Department of Mathematics, Faculty of Natural Sciences, Imperial College , London SW7 2AZ, UK
                [2 ] MRC London Institute of Medical Sciences (LMS) , London W12 0NN, UK
                [3 ] Faculty of Medicine, Institute of Clinical Sciences (ICS), Imperial College London , London W12 0NN, UK
                Author notes
                Present address: Institut Pasteur, USR 3756 IP CNRS, 28 rue du Docteur-Roux, 75015 Paris, France
                Author information
                http://orcid.org/0000-0002-2402-3165
                http://orcid.org/0000-0002-4013-5458
                Article
                btz726
                10.1093/bioinformatics/btz726
                7703772
                31584606
                3aa23252-74c0-4c8d-b426-0960fda04165
                © The Author(s) 2019. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 09 September 2018
                : 14 April 2019
                : 27 September 2019
                Page count
                Pages: 8
                Funding
                Funded by: UK Medical Research Council, a Leverhulme Research Project
                Award ID: RPG-2014-408
                Funded by: EPCRC Centre for Mathematics of Precision Health
                Funded by: Roth Scholarship from the Department of Mathematics at Imperial College
                Funded by: UK Medical Research Council
                Award ID: MR/L01632X/1
                Funded by: Imperial College Research Computing Service
                Categories
                Original Papers
                Gene Expression

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article