16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Count data generated by next-generation sequencing assays do not measure absolute transcript abundances. Instead, the data are constrained to an arbitrary “library size” by the sequencing depth of the assay, and typically must be normalized prior to statistical analysis. The constrained nature of these data means one could alternatively use a log-ratio transformation in lieu of normalization, as often done when testing for differential abundance (DA) of operational taxonomic units (OTUs) in 16S rRNA data. Therefore, we benchmark how well the ALDEx2 package, a transformation-based DA tool, detects differential expression in high-throughput RNA-sequencing data (RNA-Seq), compared to conventional RNA-Seq methods such as edgeR and DESeq2.

          Results

          To evaluate the performance of log-ratio transformation-based tools, we apply the ALDEx2 package to two simulated, and two real, RNA-Seq data sets. One of the latter was previously used to benchmark dozens of conventional RNA-Seq differential expression methods, enabling us to directly compare transformation-based approaches. We show that ALDEx2, widely used in meta-genomics research, identifies differentially expressed genes (and transcripts) from RNA-Seq data with high precision and, given sufficient sample sizes, high recall too (regardless of the alignment and quantification procedure used). Although we show that the choice in log-ratio transformation can affect performance, ALDEx2 has high precision (i.e., few false positives) across all transformations. Finally, we present a novel, iterative log-ratio transformation (now implemented in ALDEx2) that further improves performance in simulations.

          Conclusions

          Our results suggest that log-ratio transformation-based methods can work to measure differential expression from RNA-Seq data, provided that certain assumptions are met. Moreover, these methods have very high precision (i.e., few false positives) in simulations and perform well on real data too. With previously demonstrated applicability to 16S rRNA data, ALDEx2 can thus serve as a single tool for data from multiple sequencing modalities.

          Electronic supplementary material

          The online version of this article (10.1186/s12859-018-2261-8) contains supplementary material, which is available to authorized users.

          Related collections

          Author and article information

          Contributors
          contacttomquinn@gmail.com
          tamsyn.crowley@deakin.edu.au
          m.richardson@deakin.edu.au
          Journal
          BMC Bioinformatics
          BMC Bioinformatics
          BMC Bioinformatics
          BioMed Central (London )
          1471-2105
          18 July 2018
          18 July 2018
          2018
          : 19
          : 274
          Affiliations
          [1 ]ISNI 0000 0001 0526 7079, GRID grid.1021.2, Centre for Molecular and Medical Research, School of Medicine, Deakin University, ; Geelong, 3220 Australia
          [2 ]ISNI 0000 0001 0526 7079, GRID grid.1021.2, Bioinformatics Core Research Group, Deakin University, ; Geelong, 3220 Australia
          [3 ]ISNI 0000 0004 1936 7371, GRID grid.1020.3, Poultry Hub Australia, University of New England, ; Armidale, 2351 Australia
          [4 ]ISNI 0000 0001 0526 7079, GRID grid.1021.2, Centre for Integrative Ecology, School of Life and Environmental Science, Deakin University, ; Geelong, 3220 Australia
          Author information
          http://orcid.org/0000-0003-0286-6329
          Article
          2261
          10.1186/s12859-018-2261-8
          6052553
          30021534
          af27931b-7633-46d2-9b7d-8c2213c92285
          © The Author(s) 2018

          Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

          History
          : 11 December 2017
          : 25 June 2018
          Categories
          Research Article
          Custom metadata
          © The Author(s) 2018

          Bioinformatics & Computational biology
          high-throughput sequencing analysis,rna-seq,compositional data,compositional analysis,coda

          Comments

          Comment on this article