2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Benchmarking integration of single-cell differential expression

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Integration of single-cell RNA sequencing data between different samples has been a major challenge for analyzing cell populations. However, strategies to integrate differential expression analysis of single-cell data remain underinvestigated. Here, we benchmark 46 workflows for differential expression analysis of single-cell data with multiple batches. We show that batch effects, sequencing depth and data sparsity substantially impact their performances. Notably, we find that the use of batch-corrected data rarely improves the analysis for sparse data, whereas batch covariate modeling improves the analysis for substantial batch effects. We show that for low depth data, single-cell techniques based on zero-inflation model deteriorate the performance, whereas the analysis of uncorrected data using limmatrend, Wilcoxon test and fixed effects model performs well. We suggest several high-performance methods under different conditions based on various simulation and real data analyses. Additionally, we demonstrate that differential expression analysis for a specific cell type outperforms that of large-scale bulk sample data in prioritizing disease-related genes.

          Abstract

          Integration of single-cell RNA sequencing data between different samples has been a major challenge for analyzing cell populations. Here the authors benchmark 46 workflows for differential expression analysis of single-cell data with multiple batches and suggest several high-performance methods under different conditions based on simulation and real data analyses.

          Related collections

          Most cited references60

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

            In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

              Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
                Bookmark

                Author and article information

                Contributors
                dougnam@unist.ac.kr
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                21 March 2023
                21 March 2023
                2023
                : 14
                : 1570
                Affiliations
                [1 ]GRID grid.42687.3f, ISNI 0000 0004 0381 814X, Department of Biological Sciences, , Ulsan National Institute of Science and Technology, ; Ulsan, 44919 Republic of Korea
                [2 ]GRID grid.31501.36, ISNI 0000 0004 0470 5905, Department of Statistics, , Seoul National University, ; Seoul, 08826 Republic of Korea
                [3 ]GRID grid.31501.36, ISNI 0000 0004 0470 5905, Interdisciplinary Program in Bioinformatics, , Seoul National University, ; Seoul, 08826 Republic of Korea
                [4 ]GRID grid.42687.3f, ISNI 0000 0004 0381 814X, Department of Mathematical Sciences, , Ulsan National Institute of Science and Technology, ; Ulsan, 44919 Republic of Korea
                [5 ]GRID grid.25879.31, ISNI 0000 0004 1936 8972, Present Address: Department of Genetics, , University of Pennsylvania Perelman School of Medicine, ; Philadelphia, PA 19104 USA
                Author information
                http://orcid.org/0000-0002-8385-1075
                http://orcid.org/0000-0003-0239-2899
                Article
                37126
                10.1038/s41467-023-37126-3
                10030080
                36944632
                bd1af359-04bc-4b42-950d-8b931035e2ba
                © The Author(s) 2023

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 3 June 2022
                : 3 March 2023
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100003725, National Research Foundation of Korea (NRF);
                Award ID: 2020R1A2C2102268
                Award ID: 2020M3C9A5086069
                Award ID: 2016M3C9A3945893
                Award Recipient :
                Categories
                Article
                Custom metadata
                © The Author(s) 2023

                Uncategorized
                data integration,statistical methods,computational science,bioinformatics
                Uncategorized
                data integration, statistical methods, computational science, bioinformatics

                Comments

                Comment on this article