138
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A systematic evaluation of single cell RNA-seq analysis pipelines

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not yet been established. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ~3000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.

          Abstract

          There has been a rapid rise in single cell RNA-seq methods and associated pipelines. Here the authors use simulated data to systematically evaluate the performance of 3000 possible pipelines to derive recommendations for data processing and analysis of different types of scRNA-seq experiments.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage

          Tissue fibrosis is a major cause of mortality that results from the deposition of matrix proteins by an activated mesenchyme. Macrophages accumulate in fibrosis, but the role of specific subgroups in supporting fibrogenesis has not been investigated in vivo. Here we used single-cell RNA sequencing (scRNA-seq) to characterize the heterogeneity of macrophages in bleomycin-induced lung fibrosis in mice. A novel computational framework for the annotation of scRNA-seq by reference to bulk transcriptomes (SingleR) enabled the subclustering of macrophages and revealed a disease-associated subgroup with a transitional gene expression profile intermediate between monocyte-derived and alveolar macrophages. These CX3CR1+SiglecF+ transitional macrophages localized to the fibrotic niche and had a profibrotic effect in vivo. Human orthologues of genes expressed by the transitional macrophages were upregulated in samples from patients with idiopathic pulmonary fibrosis. Thus, we have identified a pathological subgroup of transitional macrophages that are required for the fibrotic response to injury.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Comparative Analysis of Single-Cell RNA Sequencing Methods.

            Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Synthetic spike-in standards for RNA-seq experiments.

              High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.
                Bookmark

                Author and article information

                Contributors
                hellmann@bio.lmu.de
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                11 October 2019
                11 October 2019
                2019
                : 10
                : 4667
                Affiliations
                [1 ]ISNI 0000 0004 1936 973X, GRID grid.5252.0, Anthropology and Human Genomics, Department of Biology II, , Ludwig-Maximilians University, ; Munich, Germany
                [2 ]ISNI 0000 0004 0373 6590, GRID grid.419502.b, Max Planck Institute for Biology of Ageing, ; Cologne, Germany
                [3 ]ISNI 0000 0004 1937 0626, GRID grid.4714.6, Department of Cell and Molecular Biology, , Karolinska Institutet, ; SE-171 65, Stockholm, Sweden
                Author information
                http://orcid.org/0000-0002-8415-1695
                http://orcid.org/0000-0002-4826-1651
                http://orcid.org/0000-0002-4056-0550
                http://orcid.org/0000-0003-0588-1313
                Article
                12266
                10.1038/s41467-019-12266-7
                6789098
                31604912
                f61b5e71-b747-4dce-99e7-9ab05a1638f8
                © The Author(s) 2019

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 27 March 2019
                : 28 August 2019
                Funding
                Funded by: Deutsche Forschungsgemeinschaft (DFG) LMUExcellent Deutsche Forschungsgemeinschaft (DFG) SFB1243 (Subproject A14/A15) Deutsche Forschungsgemeinschaft (DFG) HE 7669/1.1
                Categories
                Article
                Custom metadata
                © The Author(s) 2019

                Uncategorized
                data processing,transcriptomics
                Uncategorized
                data processing, transcriptomics

                Comments

                Comment on this article