5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Current normalization methods for RNA-sequencing data allow either for intersample comparison to identify differentially expressed (DE) genes or for intrasample comparison for the discovery and validation of gene signatures. Most studies on optimization of normalization methods typically use simulated data to validate methodologies. We describe a new method, GeTMM, which allows for both inter- and intrasample analyses with the same normalized data set. We used actual (i.e. not simulated) RNA-seq data from 263 colon cancers (no biological replicates) and used the same read count data to compare GeTMM with the most commonly used normalization methods (i.e. TMM (used by edgeR), RLE (used by DESeq2) and TPM) with respect to distributions, effect of RNA quality, subtype-classification, recurrence score, recall of DE genes and correlation to RT-qPCR data.

          Results

          We observed a clear benefit for GeTMM and TPM with regard to intrasample comparison while GeTMM performed similar to TMM and RLE normalized data in intersample comparisons. Regarding DE genes, recall was found comparable among the normalization methods, while GeTMM showed the lowest number of false-positive DE genes. Remarkably, we observed limited detrimental effects in samples with low RNA quality.

          Conclusions

          We show that GeTMM outperforms established methods with regard to intrasample comparison while performing equivalent with regard to intersample normalization using the same normalized data. These combined properties enhance the general usefulness of RNA-seq but also the comparability to the many array-based gene expression data in the public domain.

          Electronic supplementary material

          The online version of this article (10.1186/s12859-018-2246-7) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

          , , (2013)
          Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data

            A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Transcript length bias in RNA-seq data confounds systems biology

              Background Several recent studies have demonstrated the effectiveness of deep sequencing for transcriptome analysis (RNA-seq) in mammals. As RNA-seq becomes more affordable, whole genome transcriptional profiling is likely to become the platform of choice for species with good genomic sequences. As yet, a rigorous analysis methodology has not been developed and we are still in the stages of exploring the features of the data. Results We investigated the effect of transcript length bias in RNA-seq data using three different published data sets. For standard analyses using aggregated tag counts for each gene, the ability to call differentially expressed genes between samples is strongly associated with the length of the transcript. Conclusion Transcript length bias for calling differentially expressed genes is a general feature of current protocols for RNA-seq technology. This has implications for the ranking of differentially expressed genes, and in particular may introduce bias in gene set testing for pathway analysis and other multi-gene systems biology analyses. Reviewers This article was reviewed by Rohan Williams (nominated by Gavin Huttley), Nicole Cloonan (nominated by Mark Ragan) and James Bullard (nominated by Sandrine Dudoit).
                Bookmark

                Author and article information

                Contributors
                m.smid@erasmusmc.nl
                r.coeberghvdbraak@erasmusmc.nl
                h.vandewerken@erasmusmc.nl
                j.vanriet@erasmusmc.nl
                annevangalen@hotmail.com
                v.deweerd@erasmusmc.nl
                m.daane.1@erasmusmc.nl
                S.I.Bril@umcutrecht.nl
                z.lalmahomed@erasmusmc.nl
                W.Kloosterman@umcutrecht.nl
                s.wilting@erasmusmc.nl
                j.foekens@erasmusmc.nl
                j.ijzermans@erasmusmc.nl
                j.martens@erasmusmc.nl
                a.sieuwerts@erasmusmc.nl
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                22 June 2018
                22 June 2018
                2018
                : 19
                : 236
                Affiliations
                [1 ]ISNI 000000040459992X, GRID grid.5645.2, Department of Medical Oncology, Erasmus MC Cancer Institute, , Erasmus MC University Medical Center, ; 3015 CE Rotterdam, The Netherlands
                [2 ]ISNI 000000040459992X, GRID grid.5645.2, Department of Surgery, , Erasmus MC University Medical Center, ; 3015 CE Rotterdam, The Netherlands
                [3 ]ISNI 000000040459992X, GRID grid.5645.2, Cancer Computational Biology Center, Erasmus MC Cancer Institute, , Erasmus MC University Medical Center, ; 3015 CE Rotterdam, The Netherlands
                [4 ]ISNI 000000040459992X, GRID grid.5645.2, Department of Urology, Erasmus MC Cancer Institute, , Erasmus MC University Medical Center, ; 3015 CE Rotterdam, The Netherlands
                [5 ]ISNI 0000000090126352, GRID grid.7692.a, Department of Genetics, Center for Molecular Medicine, , University Medical Center Utrecht, ; 3584 CX Utrecht, The Netherlands
                [6 ]Cancer Genomics Center, 3584 CG Utrecht, The Netherlands
                Article
                2246
                10.1186/s12859-018-2246-7
                6013957
                29929481
                e139f848-c27f-4281-b0ef-cdace08ad55b
                © The Author(s). 2018

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 25 September 2017
                : 14 June 2018
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100004622, KWF Kankerbestrijding;
                Award ID: UU 2012-5710
                Award ID: UVA 2013-6331
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100008359, Maag Lever Darm Stichting;
                Award ID: FP13-20
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100008470, Cancer Genomics Centre;
                Funded by: FundRef http://dx.doi.org/10.13039/501100000781, European Research Council;
                Award ID: ERC-20120AdG-322737
                Award Recipient :
                Funded by: Daniel den Hoed Foundation
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2018

                Bioinformatics & Computational biology
                rna sequencing,normalization methods,getmm,edger,tpm,deseq2,colorectal cancer

                Comments

                Comment on this article