111
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

      research-article
      1 , , 1 , 2 ,
      Genome Biology
      BioMed Central
      Single-cell RNA-seq, Normalization

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

          Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A general and flexible method for signal extraction from single-cell RNA-seq data

            Single-cell RNA-sequencing (scRNA-seq) is a powerful high-throughput technique that enables researchers to measure genome-wide transcription levels at the resolution of single cells. Because of the low amount of RNA present in a single cell, some genes may fail to be detected even though they are expressed; these genes are usually referred to as dropouts. Here, we present a general and flexible zero-inflated negative binomial model (ZINB-WaVE), which leads to low-dimensional representations of the data that account for zero inflation (dropouts), over-dispersion, and the count nature of the data. We demonstrate, with simulated and real data, that the model and its associated estimation procedure are able to give a more stable and accurate low-dimensional representation of the data than principal component analysis (PCA) and zero-inflated factor analysis (ZIFA), without the need for a preliminary normalization step.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              EAGER: efficient ancient genome reconstruction

              Background The automated reconstruction of genome sequences in ancient genome analysis is a multifaceted process. Results Here we introduce EAGER, a time-efficient pipeline, which greatly simplifies the analysis of large-scale genomic data sets. EAGER provides features to preprocess, map, authenticate, and assess the quality of ancient DNA samples. Additionally, EAGER comprises tools to genotype samples to discover, filter, and analyze variants. Conclusions EAGER encompasses both state-of-the-art tools for each step as well as new complementary tools tailored for ancient DNA data within a single integrated solution in an easily accessible format. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0918-z) contains supplementary material, which is available to authorized users.
                Bookmark

                Author and article information

                Contributors
                christoph.hafemeister@nyu.edu
                rsatija@nygenome.org
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                23 December 2019
                23 December 2019
                2019
                : 20
                : 296
                Affiliations
                [1 ]GRID grid.429884.b, New York Genome Center, ; 101 6th Ave, New York, 10013 NY USA
                [2 ]ISNI 0000 0004 1936 8753, GRID grid.137628.9, Center for Genomics and Systems Biology, New York University, ; 12 Waverly Pl, New York, 10003 NY USA
                Author information
                http://orcid.org/0000-0001-9448-8833
                Article
                1874
                10.1186/s13059-019-1874-1
                6927181
                31870423
                f8ff9d67-864b-4833-a02d-3781924a8141
                © The Author(s) 2019

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 17 March 2019
                : 30 October 2019
                Categories
                Method
                Custom metadata
                © The Author(s) 2019

                Genetics
                single-cell rna-seq,normalization
                Genetics
                single-cell rna-seq, normalization

                Comments

                Comment on this article