38
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Hypothesis Testing and Power Calculations for Taxonomic-Based Human Microbiome Data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper presents new biostatistical methods for the analysis of microbiome data based on a fully parametric approach using all the data. The Dirichlet-multinomial distribution allows the analyst to calculate power and sample sizes for experimental design, perform tests of hypotheses (e.g., compare microbiomes across groups), and to estimate parameters describing microbiome properties. The use of a fully parametric model for these data has the benefit over alternative non-parametric approaches such as bootstrapping and permutation testing, in that this model is able to retain more information contained in the data. This paper details the statistical approaches for several tests of hypothesis and power/sample size calculations, and applies them for illustration to taxonomic abundance distribution and rank abundance distribution data using HMP Jumpstart data on 24 subjects for saliva, subgingival, and supragingival samples. Software for running these analyses is available.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data

          Next-generation sequencing techniques, and PhyloChip, have made simultaneous phylogenetic analyses of hundreds of microbial communities possible. Insight into community structure has been limited by the inability to integrate and visualize such vast datasets. Fast UniFrac overcomes these issues, allowing integration of larger numbers of sequences and samples into a single analysis. Its new array-based implementation offers orders of magnitude improvements over the original version. New 3D visualization of principal coordinates analysis (PCoA) results, with the option to view multiple coordinate axes simultaneously, provides a powerful way to quickly identify patterns that relate vast numbers of microbial communities. We demonstrate the potential of Fast UniFrac using examples from three data types: Sanger-sequencing studies of diverse free-living and animal-associated bacterial assemblages and from the gut of obese humans as they diet, pyrosequencing data integrated from studies of the human hand and gut, and PhyloChip data from a study of citrus pathogens. We show that a Fast UniFrac analysis using a reference tree recaptures patterns that could not be detected without considering phylogenetic relationships and that Fast UniFrac, coupled with BLAST-based sequence assignment, can be used to quickly analyze pyrosequencing runs containing hundreds of thousands of sequences, revealing patterns relating human and gut samples. Finally, we show that the application of Fast UniFrac to PhyloChip data could identify well-defined subcategories associated with infection. Together, these case studies point the way towards a broad range of applications and demonstrate some of the new features of Fast UniFrac.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics

            We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the ‘evidence framework’ (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the ‘Anna Karenina principle (AKP)’ applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis

              The Ribosomal Database Project (RDP-II) provides the research community with aligned and annotated rRNA gene sequences, along with analysis services and a phylogenetically consistent taxonomic framework for these data. Updated monthly, these services are made available through the RDP-II website (http://rdp.cme.msu.edu/). RDP-II release 9.21 (August 2004) contains 101 632 bacterial small subunit rRNA gene sequences in aligned and annotated format. High-throughput tools for initial taxonomic placement, identification of related sequences, probe and primer testing, data navigation and subalignment download are provided. The RDP-II email address for questions or comments is rdpstaff@msu.edu.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2012
                20 December 2012
                : 7
                : 12
                : e52078
                Affiliations
                [1 ]Division of General Medical Sciences, Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, United States of America
                [2 ]Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, Virginia, United States of America
                [3 ]The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
                Utah State University, United States of America
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Conceived and designed the experiments: GW ES. Performed the experiments: GW ES. Analyzed the data: PSL ED WDS. Wrote the paper: PSL WDS. Design Statistical Methods: PSL JPB ELB DJE QW WDS. Design Software: PSL ED WDS.

                Article
                PONE-D-12-09606
                10.1371/journal.pone.0052078
                3527355
                23284876
                4a120c46-0aaf-40b1-be37-b3d4672f3722
                Copyright @ 2012

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 2 April 2012
                : 13 November 2012
                Page count
                Pages: 13
                Funding
                This work was supported by National Institutes of Health (NIH) Grant U54 HG004968 “Human Microbiome Project Consortium Sequencing of Healthy People”, NIH Grant 1UH2AI083265 “The Neonatal Microbiome and Necrotizing Enterocolotis”, and St. Louis Children’s Hospital and Children Discovery Institute Grant “The St. Louis Neonatal Gut Microbiome Initiative”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Ecology
                Community Ecology
                Community Structure
                Ecological Metrics
                Relative Abundance Distribution
                Species Diversity
                Biodiversity
                Microbial Ecology
                Population Ecology
                Genomics
                Metagenomics
                Microbiology
                Bacteriology
                Microbial Ecology
                Population Biology
                Population Ecology
                Mathematics
                Statistics
                Biostatistics
                Decision Theory
                Statistical Methods
                Medicine
                Clinical Research Design
                Statistical Methods

                Uncategorized
                Uncategorized

                Comments

                Comment on this article