37
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          RNA-seq is now the technology of choice for genome-wide differential gene expression experiments, but it is not clear how many biological replicates are needed to ensure valid biological interpretation of the results or which statistical tools are best for analyzing the data. An RNA-seq experiment with 48 biological replicates in each of two conditions was performed to answer these questions and provide guidelines for experimental design. With three biological replicates, nine of the 11 tools evaluated found only 20%–40% of the significantly differentially expressed (SDE) genes identified with the full set of 42 clean replicates. This rises to >85% for the subset of SDE genes changing in expression by more than fourfold. To achieve >85% for all SDE genes regardless of fold change requires more than 20 biological replicates. The same nine tools successfully control their false discovery rate at ≲5% for all numbers of replicates, while the remaining two tools fail to control their FDR adequately, particularly for low numbers of replicates. For future RNA-seq experiments, these results suggest that at least six biological replicates should be used, rising to at least 12 when it is important to identify SDE genes for all fold changes. If fewer than 12 replicates are used, a superior combination of true positive and false positive performances makes edgeR and DESeq2 the leading tools. For higher replicate numbers, minimizing false positives is more important and DESeq marginally outperforms the other tools.

          Related collections

          Most cited references77

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

            In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

              Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org). Contact: mrobinson@wehi.edu.au
                Bookmark

                Author and article information

                Journal
                RNA
                RNA
                RNA
                RNA
                Cold Spring Harbor Laboratory Press
                1355-8382
                1469-9001
                June 2016
                June 2016
                : 22
                : 6
                : 839-851
                Affiliations
                [1 ]Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
                [2 ]Division of Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
                [3 ]Edinburgh Genomics, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
                [4 ]Division of Plant Sciences, College of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
                [5 ]Division of Biological Chemistry and Drug Discovery, College of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
                Author notes
                [6]

                These authors contributed equally to this work.

                Abbreviations:: DE, differentially expressed; SDE, significantly differentially expressed; DGE, differential gene expression; TP(R), true positive (rate); FP(R), false positive (rate); TN(R), true negative (rate); FN(R), false negative (rate); FD(R), false discovery rate (see FPR); WT, wild type

                Author information
                http://orcid.org/0000-0001-9068-9654
                http://orcid.org/0000-0002-6398-2537
                http://orcid.org/0000-0002-2560-2484
                http://orcid.org/0000-0002-9014-5355
                Article
                9509184 RA
                10.1261/rna.053959.115
                4878611
                27022035
                a77db5e6-9093-488f-8b3c-87b3e2ee3aa6
                © 2016 Schurch et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society

                This article, published in RNA, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

                History
                : 13 August 2015
                : 17 February 2016
                Categories
                Article

                rna-seq,benchmarking,differential expression,replication,yeast,experimental design,statistical power

                Comments

                Comment on this article