102
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics

      research-article
      1 , 2 , 2 , *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the ‘evidence framework’ ( http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the ‘Anna Karenina principle (AKP)’ applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: not found

          Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.

          The ongoing revolution in high-throughput sequencing continues to democratize the ability of small groups of investigators to map the microbial component of the biosphere. In particular, the coevolution of new sequencing platforms and new software tools allows data acquisition and analysis on an unprecedented scale. Here we report the next stage in this coevolutionary arms race, using the Illumina GAIIx platform to sequence a diverse array of 25 environmental samples and three known "mock communities" at a depth averaging 3.1 million reads per sample. We demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature (notably, the saline/nonsaline split in environmental samples and the split between host-associated and free-living communities). We also demonstrate that 2,000 Illumina single-end reads are sufficient to recapture the same relationships among samples that we observe with the full dataset. The results thus open up the possibility of conducting large-scale studies analyzing thousands of samples simultaneously to survey microbial communities at an unprecedented spatial and temporal resolution.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Bayesian Interpolation

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex.

              We constructed error-correcting DNA barcodes that allow one run of a massively parallel pyrosequencer to process up to 1,544 samples simultaneously. Using these barcodes we processed bacterial 16S rRNA gene sequences representing microbial communities in 286 environmental samples, corrected 92% of sample assignment errors, and thus characterized nearly as many 16S rRNA genes as have been sequenced to date by Sanger sequencing.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2012
                3 February 2012
                : 7
                : 2
                : e30126
                Affiliations
                [1 ]Department of Bioengineering, University of California, Berkeley, California, United States of America
                [2 ]School of Engineering, University of Glasgow, Glasgow, United Kingdom
                Argonne National Laboratory, United States of America
                Author notes

                Conceived and designed the experiments: IH KH CQ. Performed the experiments: IH KH CQ. Analyzed the data: IH KH CQ. Contributed reagents/materials/analysis tools: IH KH CQ. Wrote the paper: IH KH CQ.

                Article
                PONE-D-11-15801
                10.1371/journal.pone.0030126
                3272020
                22319561
                0351ed80-a8cb-425c-851c-91aad2367ddd
                Holmes et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 11 August 2011
                : 12 December 2011
                Page count
                Pages: 15
                Categories
                Research Article
                Biology
                Computational Biology
                Genomics
                Genome Analysis Tools
                Ecology
                Genomics
                Medicine
                Clinical Research Design

                Uncategorized
                Uncategorized

                Comments

                Comment on this article