115
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Normalization and microbial differential abundance strategies depend upon data characteristics

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Data from 16S ribosomal RNA (rRNA) amplicon sequencing present challenges to ecological and statistical interpretation. In particular, library sizes often vary over several ranges of magnitude, and the data contains many zeros. Although we are typically interested in comparing relative abundance of taxa in the ecosystem of two or more groups, we can only measure the taxon relative abundance in specimens obtained from the ecosystems. Because the comparison of taxon relative abundance in the specimen is not equivalent to the comparison of taxon relative abundance in the ecosystems, this presents a special challenge. Second, because the relative abundance of taxa in the specimen (as well as in the ecosystem) sum to 1, these are compositional data. Because the compositional data are constrained by the simplex (sum to 1) and are not unconstrained in the Euclidean space, many standard methods of analysis are not applicable. Here, we evaluate how these challenges impact the performance of existing normalization methods and differential abundance analyses.

          Results

          Effects on normalization: Most normalization methods enable successful clustering of samples according to biological origin when the groups differ substantially in their overall microbial composition. Rarefying more clearly clusters samples according to biological origin than other normalization techniques do for ordination metrics based on presence or absence. Alternate normalization measures are potentially vulnerable to artifacts due to library size.

          Effects on differential abundance testing: We build on a previous work to evaluate seven proposed statistical methods using rarefied as well as raw data. Our simulation studies suggest that the false discovery rates of many differential abundance-testing methods are not increased by rarefying itself, although of course rarefying results in a loss of sensitivity due to elimination of a portion of available data. For groups with large (~10×) differences in the average library size, rarefying lowers the false discovery rate. DESeq2, without addition of a constant, increased sensitivity on smaller datasets (<20 samples per group) but tends towards a higher false discovery rate with more samples, very uneven (~10×) library sizes, and/or compositional effects. For drawing inferences regarding taxon abundance in the ecosystem, analysis of composition of microbiomes (ANCOM) is not only very sensitive (for >20 samples per group) but also critically the only method tested that has a good control of false discovery rate.

          Conclusions

          These findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s40168-017-0237-y) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.

            The ongoing revolution in high-throughput sequencing continues to democratize the ability of small groups of investigators to map the microbial component of the biosphere. In particular, the coevolution of new sequencing platforms and new software tools allows data acquisition and analysis on an unprecedented scale. Here we report the next stage in this coevolutionary arms race, using the Illumina GAIIx platform to sequence a diverse array of 25 environmental samples and three known "mock communities" at a depth averaging 3.1 million reads per sample. We demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature (notably, the saline/nonsaline split in environmental samples and the split between host-associated and free-living communities). We also demonstrate that 2,000 Illumina single-end reads are sufficient to recapture the same relationships among samples that we observe with the full dataset. The results thus open up the possibility of conducting large-scale studies analyzing thousands of samples simultaneously to survey microbial communities at an unprecedented spatial and temporal resolution.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness

                Bookmark

                Author and article information

                Contributors
                858-246-1184 , robknight@ucsd.edu
                Journal
                Microbiome
                Microbiome
                Microbiome
                BioMed Central (London )
                2049-2618
                3 March 2017
                3 March 2017
                2017
                : 5
                : 27
                Affiliations
                [1 ]ISNI 0000000096214564, GRID grid.266190.a, Department of Chemical and Biological Engineering, , University of Colorado at Boulder, ; Boulder, CO 80309 USA
                [2 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Departments of Pediatrics, , University of California San Diego, ; 9500 Gilman Drive, MC 0763, La Jolla, CA 92093 USA
                [3 ]ISNI 0000 0001 2297 5165, GRID grid.94365.3d, Biostatistics and Computational Biology Branch, NIEHS, , NIH, ; Research Triangle Park Durham, NC USA
                [4 ]ISNI 0000 0004 1936 8972, GRID grid.25879.31, Department of Microbiology, , University of Pennsylvania, ; Philadelphia, PA 18014 USA
                [5 ]ISNI 0000000107903411, GRID grid.241116.1, Department of Medicine, , University of Colorado, ; Denver, CO 80204 USA
                [6 ]ISNI 0000 0001 2112 1969, GRID grid.4391.f, Department of Microbiology, , Oregon State University, ; 226 Nash Hall, Corvallis, OR 97331 USA
                [7 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Department of Computer Science & Engineering, , University of California San Diego, ; La Jolla, CA 92093 USA
                [8 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Center for Computational Biology and Bioinformatics, Dept. of Medicine, , University of California San Diego, ; La Jolla, CA 92093 USA
                [9 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Center for Microbiome Innovation, , University of California San Diego, ; La Jolla, CA 92093 USA
                Article
                237
                10.1186/s40168-017-0237-y
                5335496
                28253908
                053aa899-2822-466a-a7be-83f2426fa4c8
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 9 October 2015
                : 27 January 2017
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000051, National Human Genome Research Institute;
                Award ID: 3 R01 HG004872-03S2
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: 5 U01 HG004866-04
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000066, National Institute of Environmental Health Sciences;
                Award ID: Z01 ES101744-04
                Award Recipient :
                Funded by: CTRI
                Award ID: UL1TR001442
                Award Recipient :
                Categories
                Research
                Custom metadata
                © The Author(s) 2017

                microbiome,normalization,differential abundance,statistics

                Comments

                Comment on this article