79
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Monitoring Bacterial Community of Human Gut Microbiota Reveals an Increase in Lactobacillus in Obese Patients and Methanogens in Anorexic Patients

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Studies of the bacterial communities of the gut microbiota have revealed a shift in the ratio of Firmicutes and Bacteroidetes in obese patients. Determining the variations of microbial communities in feces may be beneficial for the identification of specific profiles in patients with abnormal weights. The roles of the archaeon Methanobrevibacter smithii and Lactobacillus species have not been described in these studies.

          Methods and Findings

          We developed an efficient and robust real-time PCR tool that includes a plasmid-based internal control and allows for quantification of the bacterial divisions Bacteroidetes, Firmicutes, and Lactobacillus as well as the methanogen M. smithii. We applied this technique to the feces of 20 obese subjects, 9 patients with anorexia nervosa, and 20 normal-weight healthy controls. Our results confirmed a reduction in the Bacteroidetes community in obese patients (p<0.01). We found a significantly higher Lactobacillus species concentration in obese patients than in lean controls (p = 0.0197) or anorexic patients (p = 0.0332). The M. smithii concentration was much higher in anorexic patients than in the lean population (p = 0.0171).

          Conclusions

          Lactobacillus species are widely used as growth promoters in the farm industry and are now linked to obesity in humans. The study of the bacterial flora in anorexic patients revealed an increase in M. smithii. This increase might represent an adaptive use of nutrients in this population.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: not found

          Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing

          Introduction The biosphere contains between 1030 and 1031 microbial genomes, at least 2–3 orders of magnitude more than the number of plant and animal cells combined [1]. Microbes control global utilization of nitrogen through nitrogen fixation, nitrification, and nitrate reduction, and drive the bulk of sulfur, iron and manganese biogeochemical cycles [2]. They regulate the composition of the atmosphere, influence climates, recycle nutrients, and decompose pollutants. Without microbes, multi-cellular life on earth would not have evolved and biology as we know it would not be sustainable. The diversity of microbial communities and their ecologic and metabolic functions are being explored across a great range of natural environments: in soils [3]–[5], air [6] and seas [7]–[10], on plants [11] and in animals [12],[13] and in extreme environments such as the arctic [14], deep-sea vents [15], uranium-contaminated soil [16], and waste-water treatment discharge [17]. In recognition of the role marine microbes play in the biogeochemical processes that are critical to life in all environments on Earth including carbon and nitrogen cycling, the International Census of Marine Microbes (ICoMM: http://icomm.mbl.edu) has launched an international effort to catalogue the diversity of microbial populations in the oceanic, coastal, and benthic waters. Microbes associated with human health will be intensely studied through two recent large-scale initiatives: the Human Microbiome Project sponsored by the NIH (http://nihroadmap.nih.gov/hmp/) and MetaHIT sponsored by the EU (http://www.metahit.eu), which seek to characterize the composition, diversity and distribution of human-associated microbial communities. Other recent human health studies include microbes in breast milk [18], chronic wounds [19], human gut [20], dental caries [21], and childcare facilities [22]. Microbes associated with the human body outnumber human cells by at least a factor of ten [23]. Some microbes cause disease, but the overwhelming majority are either innocuous or play a role in human physiology, including immune response, digestion and vitamin production. As recently as the late 1980's, descriptions of human-associated microbiota were constrained by cultivation technologies. Over the last twenty years, sequencing surveys of amplified regions of small subunit ribosomal RNA (SSU rRNA) genes have revealed that microbial diversity is much greater than the 5,000 microbial species described using phenotypic features in Bergey's taxonomic outline [24], and that microbial communities are far more complex than initially thought. For instance, E. coli, once thought to be a dominant species in the human gut, is clearly a minor member relative to various members of the phyla Bacteroidetes and Firmicutes. It is now evident that microbiologists have been successful in culturing fewer than one percent of the different kinds of single cell organisms from most microbial communities [25]. Even for well-studied communities, such as the human distal gut, only 20–40% of the microbes have been cultured. Deeper surveys with new approaches are revealing ever-greater diversity. Even these studies, with hundreds of thousands of microbes sampled, have not been extensive enough to provide a complete picture of the diversity (richness) and relative abundance (evenness) of microbial communities. To explore these questions, microbiologists must be able to compare microbial communities within and across individuals, in different states of health and disease, and over time. The first step in these community analyses is to develop detailed descriptions of each population, including low abundance taxa that comprise the rare biosphere [9]. Exploration of the human microbiome can leverage methods used to explore the microbiome of other environments such as soil, the deep sea, and other vertebrate microbiomes. By necessity, microbiologists have historically focused their efforts on the dominant components of microbial communities. Recognizing the importance of gathering information about high, medium and very low abundance taxa, Sogin et al. [9] introduced the use of massively-parallel DNA sequencing of short hypervariable regions of SSU rRNA to characterize microbial populations. In a subsequent study that collected nearly one million short hypervariable region tags, Huber et al. [15] demonstrated that there are over ∼40,000 different kinds of bacteria and archaea in a few liters of hydrothermal vent fluid. Rarefaction data from this study and others show that in many environments, even this level of sequencing is insufficient to fully describe microbial diversity [5],[26]. The lower cost and higher throughput of pyrosequencing employed in these studies allows for sampling efforts that are orders of magnitude greater than traditional capillary dideoxy sequencing of cloned SSU rRNA amplicons [27]. With the recently announced capability to sequence >400 nt, it will be possible to span most hypervariable regions, multiple adjacent hypervariable regions, or possibly combinations of non-adjacent hypervariable regions through paired-end sequencing strategies. However, comparisons of 400 nt reads from rapidly evolving rRNA regions do not contain sufficient evolutionary information to infer robust phylogenetic relationships and the hundreds of thousands of reads produced in a single experiment far exceeds the limitations of current phylogenetic software. Tags from the V6 region are also too short for current implementations of Bayesian classifiers such as the Ribosomal Database Project Classifier (RDP) [28]. However, each read represents a hypervariable region tag of an SSU rRNA gene present in the sample. We developed a tag search engine, Global Alignment for Sequence Taxonomy [9], GAST, which utilizes existing databases of full-length SSU rRNA genes and their pre-computed phylogeny for high-throughput taxonomic analysis of microbial communities using hypervariable region tag sequences. The use of a single, small hypervariable region tag for assigning taxonomy presents several challenges. The information content in a short hypervariable region sequence may not be sufficient for inferring taxonomic affinity. BLAST [29] alone is insufficient to consistently identify the sequences in molecular databases that are the closest matches to tag queries along their full length. Here we analyze the reliability of assigning taxonomic identifiers based solely on tags, specifically using the V3 and the V6 hypervariable regions of SSU rRNA. Using SSU rRNA genes from the human gut and deep-sea vents, we compare the taxonomic assignments of the full-length sequences with the taxonomic assignments of their V3 and V6 regions, excised in silico. We then examine microbial populations of the human gut in greater detail, using both massively-parallel pyrosequencing of hypervariable region tag and Sanger-generated full-length sequences, to determine if any differences in sampling and taxonomic assignment exist with these two sequencing strategies. In a companion paper [30], we used GAST to examine the impact of the antibiotic ciprofloxacin on population structures of the human microbiome. Results Assessment of Hypervariable Region Specificity in the RefSSU Database Tags from a hypervariable region must map to a full-length SSU rRNA with minimal ambiguity to serve as reliable phylogenetic markers; i.e. phylogenetically distinct lineages should not contain identical tags. Most of the redundant SSU rRNA sequences containing identical sequences for a particular hypervariable region in the database are from the same genus. This redundancy does not interfere with the use of the hypervariable region tags for taxonomy, but rather strengthens the assignment. For each unique V3 and V6 sequence, we examined the number and taxonomy of all the source full-length sequences. We treat each hypervariable region independently, since two full-length rRNA sequences with identical V3 regions may differ at the V6 region or another hypervariable region. Of the 59,830 unique V6 reference sequences, 74% mapped to one SSU rRNA sequence, 10% mapped to 2 sequences, and only 5% mapped to 7 or more. The V3 region, which is longer, showed slightly better resolution: 82% mapped to one SSU rRNA, 8% mapped to 2 sequences, and only 3% mapped to 7 or more (results not shown). Although only a small percentage of tags map to more than a few SSU rRNA sequences, this included some that mapped to a very large number of different SSU rRNA sequences. For example, 3 V6 reference sequences and 8 V3 reference sequences each mapped to more than 1000 different SSU rRNA sequences. In almost all cases where a hypervariable tag sequence maps to more than one full-length SSU rRNA sequence, the overwhelming majority of full-length sequences still map to only one genus (Table 1). Since RefSSU sequences that give rise to identical hypervariable region tags are generally from the same or highly similar organisms, V6 and V3 tags can be unambiguously mapped to the genus level 97% and 99% of the time, respectively. Even if we examine only the subset of V3 sequences that mapped to multiple SSU rRNA sequences, 95% map uniquely to genus, 98% to family, and 99% to order, class and phylum. Similarly for the V6 region, 91% of tags derived from multiple SSU rRNA entries map uniquely to genus, 96% to family, 97% to order and 99% to class and phylum. Even in those cases where reference tags mapped to more than 1,000 full-length sequences, in 9 of the 11 cases there was one dominant taxon. Of the other two cases, one had complete consensus to the order level and the other had all but two matching at family level. Not only do most of the reference tags in the database have only one SSU rRNA source, even those from multiple SSU rRNA sources still represent almost exclusively one taxon. 10.1371/journal.pgen.1000255.t001 Table 1 Percent of hypervariable region tags from the RefSSU database that map to one or more taxa. Hypervariable region V3 Number of Taxa 1 2 3 4 5+ Phylum 99.96% / 114328 0.04% / 42 0.00% / 42 0 0 Class 99.93% / 109352 0.07% / 77 0.00% / 2 0 0 Order 99.88% / 99682 0.12% / 113 0.00% / 4 0 0 Family 99.62% / 88015 0.34% / 297 0.04% / 34 0.00% / 3 0.00% / 3 Genus 99.11% / 69686 0.07% / 495 0.09% / 64 0.05% / 35 0.00% / 29 Hypervariable region V6 Number of Taxa 1 2 3 4 5+ Phylum 99.83% / 54728 0.17% / 94  = 0.15, insufficient data to analyze the accuracy of the GAST process for these larger distances. 10.1371/journal.pgen.1000255.t002 Table 2 Comparison of taxonomic assignments using full-length and in silico generated V3 and V6 hypervariable region tags. Human Gut Microbiome V3 region Deep-sea Vents Microbiome V3 region Count Same %Same Count Same %Same Superkingdom 7208 7208 100.00% / 100.00% 963 963 100.00% / 100.00% Phylum 7168 7145 99.68% / 99.82% 920 901 97.93% / 97.68% Class 7033 7002 99.56% / 99.52% 884 857 96.95% / 99.55% Order 7019 6987 99.54% / 99.57% 807 784 97.15% / 100.00% Family 5726 5688 99.34% / 98.93% 764 746 97.64% / 98.44% Genus 5178 5153 99.52% / 97.99% 701 686 97.86% / 99.10% Human Gut Microbiome V6 region Deep-sea Vents Microbiome V6 region Count Same %Same Count Same %Same Superkingdom 7215 7215 100.00%/ 100.00% 1058 1058 100.00% / 99.53% Phylum 7175 7152 99.68% / 99.82% 1008 986 97.82% / 90.00% Class 7040 7009 99.56% / 99.52% 970 939 96.80% / 90.08%% Order 7026 6994 99.54% / 99.56% 881 847 96.14% / 87.06% Family 5731 5693 99.34% / 98.93% 833 814 97.72% / 87.23% Genus 5183 5158 99.52% / 97.97% 766 749 97.78% / 88.52% Treating the V3 and the V6 regions independently, we counted the number of assignments the GAST process made at each taxonomic rank and the number and percent of times those assignments were the same as the assignment given to the full-length source sequence. The second percent value is the rate at which the top BLAST match predicted the same assignment as the full-length source. We compared the use of GAST to assign taxonomy with the use of the top BLAST match (Table 2). While the top BLAST match was consistent with GAST to the order level, it was less accurate for family and genus for the V6 region, especially for the deep-sea vent sequences. In addition, the top BLAST match was often not the best GAST match in cases where the taxonomic assignment was the same (Table 3). A comparison of the BLAST rank vs. GAST distance did not show any significant correlation (results not shown). 10.1371/journal.pgen.1000255.t003 Table 3 BLAST ranks for top GAST hits. Human Microbiome Deep-Sea Vents BLAST Rank V3 region V6 region V3 region V6 region 1 83.70% 94.90% 75.17% 81.89% 2 13.48% 4.34% 5.27% 7.16% >2 2.82% 0.76% 19.57% 10.95% The percentages reported represent the frequency with which the top GAST match corresponds to the top BLAST match, the second best BLAST match, or any other BLAST match. Comparison of Population Sampling Using Sanger-Generated Full-Length and Pyrosequencing-Generated V3 and V6 Tags for the Human Gut Microbiome Dataset We compared taxonomic assignments and their frequencies for the human gut microbiome data sampled with full-length SSU rRNA (n = 7215, length = 1300–1450 nt), V3 tags (n = 422,992, trimmed length = 100–200 nt) and V6 tags (n = 441,894, trimmed length = 50–70 nt) [30]. Full-length sampling detected a total of 43 genera, V3 pyrosequencing detected 116 genera and V6 pyrosequencing detected 103 genera (Figure 2). V3 sampling detected 74 genera that were not detected using the full-length sequencing and V6 sampling detected 60 genera (102 different genera combined). No genera were detected by the full-length sequencing alone. V3 sampling missed one taxon represented in the full-length sequences: Escherichia, detected 5 times with the full-length and once with the V6. V6 sampling missed three taxa, Hespellia, Klebsiella, and TM7 detected by the full-length with only one sequence each (detected 12, 16 and 4 times with V3, respectively). Only the TM7 sequences could be affected by a primer bias; the other two taxa have exact matches to forward and reverse primers as represented in the reference database of full-length SSU rRNA. The TM7 includes several different primer region sequences in the reference database each of which is one or two bases from its nearest primer, but only at the 5′ end, which should still detect abundant genera. The lack of TM7 in the V6 pyrosequencing data could be caused by a primer bias acting on a rare population, or could simply be the undersampling of rare organisms. In sum, less than 1% of the taxa identified by full-length sequences, representing 1) that were lost in the full-length sequencing (y = 0). The x-intercept comparing the two tag sequencing experiments (Figure 3C) is at the origin, implying that the two hypervariable regions were comparable in elucidating the rare biosphere. Primer bias was shown to be negligible in most cases, and undersampling is the most likely cause for the differences in small populations detected. Both tag sequencing experiments found similar numbers of taxa and substantially more than in the full-length sequencing. For tag sequencing of hypervariable regions to be effective for mapping taxonomy, specific sequences must match unambiguously to source organisms. If it were common for two divergent organisms to have the same or highly similar V3 or V6 regions, tag sequencing would not be an accurate means for assigning taxonomy. In our reference database of over 500,000 rRNA sequences, we found that hypervariable region tags map to individual taxa with high fidelity. In only a few cases for either the V3 or the V6 region did we find sequences that exactly match two or more distinct taxa (Table 1). The reference databases are replete with highly studied bacteria, and multiple copies of a hypervariable region for these organisms do not imply any taxonomic ambiguity. The V3 region was 99% accurate to the genus level. The V6 region was only slightly less resolved than the V3, still providing a 97% accuracy in assignment to the genus level, 98+% accuracy to the level of family, and 99% accuracy at the level of order. The V3 region is longer than the V6, which should have a positive effect on its specificity. The RefV6 database contains about half the number of reference sequences as the RefV3, and the accuracy of the V6 may be even greater as the reference databases grow. These levels of accuracy show that hypervariable region tags contain adequate information to accurately map taxonomy of both bacterial and archaeal organisms. To assign taxonomy to our long SSU rRNA gene sequences, both in our RefSSU database and in our experimental sequences, we used the Ribosomal Database Project classifier (RDP). RDP is not 100% accurate and some of the ambiguities in the reference database could be attributable to limitations with the RDP classifier. For tag sequencing with the GAST process, however, we are assessing the utility of a tag as a surrogate for the longer SSU rRNA sequences via a look-up and distance matrix. Can we consistently assign the correct taxonomy to both a SSU rRNA sequence and its constituent hypervariable regions independently? The RDP taxonomy provides a consistent and high-quality taxonomic classification (Bergey's taxonomy), facilitating our analysis. Slight inaccuracies in RDP are not an important factor in whether a tag sequence can be used as a surrogate for a full-length SSU rRNA sequence. We conducted an in silico experiment assessing taxonomic assignment using tags of hypervariable regions extracted from full-length sequences and compared the tags directly to the full-length sequences. We used two independent datasets, a human gut microbiome SSU rRNA dataset, which should be relatively well represented in the reference databases of SSU rRNA genes, and one of deep-sea vent microbes, which are less well studied and therefore less well represented in the reference databases. We examined the use of both the V3 and the V6 region tags to assign taxonomy for both groups of microbes. In all four cases, we found excellent correspondence between the use of the GAST process for assigning taxonomy to short hypervariable region tags and the use of RDP for assigning taxonomy to the full-length sequences. For both variable regions from the human microbiota, the taxonomic assignments of the tags agreed with the long sequences at a rate consistently greater than 99%. The deep-sea vent data agreed in over 97% of instances at the genus level. The two variable regions mapped taxonomy with virtually identical fidelity. The greater difference was not in choice of variable region, but in the environment examined. Of the human gut microbes sampled, 91% of all tags had exact matches in the reference database and virtually all had matches within a 10% sequence match. Of the deep-sea vent microbes, which have not historically been as well studied, only 51% had exact matches in the reference database and only 90% were within a 10% sequence match of a reference sequence. Despite the fact that as many as 10% of the deep-sea vent tags did not have a close match in the reference database, the GAST process only mis-assigned 3% of the tags. The GAST process accurately assigns taxonomy to tags diverging as much as 15% from their nearest reference match (Figure 1). Although our data included insufficient tags with GAST distances greater than 0.15 to fully assess their accuracy, the ability to transfer taxonomic information from the reference database is will certainly be less accurate at greater GAST distances. The deep-sea vent mismatch rate was similar for both hypervariable regions and at each taxonomic level from genus to phylum, and less than one-third of the mismatches were for tags >15% divergent. A possible explanation is the deep-sea vent environment may contain phyla that are not yet adequately described, and whose full-length SSU rRNA sequences are therefore not well classified by the RDP. These results imply that hypervariable region tag sequencing and the GAST process are excellent tools for assigning taxonomy, but they cannot overcome basic gaps in knowledge of the under-explored areas of the microbiome. As more is learned about these organisms, and their full-length SSU rRNA genes are added to the reference databases, hypervariable region tags sequencing projects will directly benefit, and taxonomy will improve. The top BLAST matches for the human gut microbiome mapped taxonomy better than the top BLAST matches for the deep-sea vents. This is likely because the human microbiome data are better represented in the reference database. Since 91% of the human microbiome tags had exact matches in the reference database these should be consistently identified by BLAST as the best match. For the deep-sea samples where only 51% had exact matches, a top BLAST hit may find only a local match within the sequence rather than a globally-weighted match as with GAST. Reliance on local alignments can be misleading: 80% of the deep-sea vent V6 tags that did not rank among the top two BLAST hits showed a BLAST alignment length shorter than the tag sequence for the top BLAST hit. BLAST failed to identify the correct genus for 11% of these V6 tag sequences, whereas GAST which failed for only 2%. An increase in distance from the tag to the nearest reference sequence using GAST did not correlate with a lower BLAST rank. The magnitude of divergence from the reference database does not explain the difference between V3 and V6 regions. The V3 tags were noticeably more divergent from their top BLAST hit than were the V6 tags, despite the larger dataset of reference V3 sequences. This did not adversely affect the ability to identify tags to the genus level. Since the RDP taxonomy is restricted to the genus level, we could not review the BLAST ranks for species-level. Full-length sequencing missed 63% of the genera identified by V3 and 58% of the genera identified by V6 (for more details, see Dethlefsen et al. [30]). The full-length sequencing uncovered only four rare taxa missed by one or the other hypervariable region, but no taxa missed by both of the hypervariable regions. Primer mismatches were minimal, but may be relevant when combined with a very low abundance. The hypervariable region tag sequencing did not introduce any strong biases against the discovery of common taxa or the relative abundance of these taxa in this experiment. As predicted, the hypervariable region tag sequencing provided a much greater breadth and depth of sampling. Although the level of sampling with tag sequencing is orders of magnitude greater than with traditional methods, a single pyrosequencing run (with >400,000 sequences) is still insufficient to fully sample the rare biota in the human distal gut. The sampling limitation of this experiment can be seen in the small but distinct number of taxa that appeared in one but not the other tag sequencing experiment. All were of low abundance and are dispersed throughout the microbial world rather than clustering in one specific taxon. Since no common taxa were omitted by sequencing of either hypervariable region, any effects of primer bias are limited to rare taxa and cannot be discerned from the effects of undersampling. Other methods such as Greengenes [31] and SeqMatch [32] use short tag sequences to determine phylogenetic affinity through comparisons to reference data sets of nearly full-length SSU rRNA sequences without requiring the inference of phylogenetic trees from the hypervariable regions. Greengenes (http://greengenes.lbl.gov) uses NAST [33] to align tags by inserting them into a pre-existing database of >10,000 aligned full-length sequences, and then assigns taxonomy based on the nearest neighbor in the database. Liu et al [34] used NAST on simulated tag sequences and found similar Unifrac clustering results from the tags as from their full-length source sequences. The Greengenes website is limited to 500 sequences and uses a database of only 10,000 sequences for comparison and does not perform a consensus of multiple taxonomy matches. SeqMatch uses a k-nearest-neighbor, word-matching algorithm rather than a multiple sequence alignment to display nearest matches in the RDP dataset, and uses the lowest common taxon for consensus (essentially a unanimous consensus rather than 66% used by GAST). The RDP dataset is smaller than SILVA database but more selective. The website tool is limited to 2,000 sequences. Conclusions/Significance Hypervariable region tag sequencing using either the V3 or the V6 region, and presumably other hypervariable regions, is an effective means for assigning taxonomy and provides great advantages over traditional sampling. A tag mapping process such as GAST with an extensive database of rRNA genes such as our RefSSU derived from SILVA can map tag sequences to the same taxonomy as their source genes at better than a 99% correlation rate for commonly studied environments such as the human microbiome and better than 96% for less commonly studied environments such as deep-sea vents. The V3 and V6 regions have only minimal ambiguity in mapping to the SSU rRNA gene all the way to the genus level. While tags can map to more than one SSU rRNA source, these cross-mappings between taxa are infrequent and do not compromise the overall methodology. We show that these short hypervariable region tags contain adequate information to uniquely and accurately map the phylogeny with a 98% or greater fidelity even without an exact match in the reference database or with potential multiple copies in the database. GAST is accurate for tags as much as 15% divergent from their nearest reference match, although there were very few tags that far from the current set of reference SSU rRNA sequences. The GAST distance, like BLAST scores or e-values, should be maintained and used as an assessment for the likely reliability of the GAST assignment of more divergent tags. The consistently high correspondence of the hypervariable region tags vs. long SSU rRNA taxonomies shows the robustness of the GAST process and the use of tags as surrogates for the full-length rRNA genes even for microbial environments that are not well-represented in the reference databases. Massively-parallel pyrosequencing of tags can be used to great advantage over traditional sequencing of full-length rRNA genes to explore both the diversity and relative abundance of microbial populations. Further research into hypervariable region tag sequencing may uncover advantages of one region over another, such as the relative levels of microvariation, length of sequence, density of homopolymers (which can lead to pyrosequencing errors), ability to identify to the species level, or the merits of different amplification primers. Tag sequencing yields similar taxonomy and relative abundance values as conventional sequencing of full-length SSU rRNA genes, but provides more reads, uncovers more organisms, avoids assembly, and costs less per read than conventional sequencing of full-length SSU rRNA genes. As the technology continues to improve, yielding greater read counts and longer sequences, pyrosequencing will provide even greater opportunities for tag sequencing, such as the use of longer hypervariable regions or combinations of variable regions, and ever-greater sampling depth. This process will also improve as reference databases of SSU rRNA genes continue to grow. The great advantage of hypervariable region tag sequencing is that it can take advantage of massively-parallel pyrosequencing, sampling to depths several orders of magnitude greater than previously achieved, facilitating the exploration of the vast diversity of microbial populations and the rare biosphere. Materials and Methods Creating the Reference Database of Full-Length, V3, and V6 SSU rRNA Sequences We downloaded 503,971 aligned small subunit rRNA sequences from the SILVA database, version 92 [35]. Using the SILVA quality assessments, we eliminated low-quality sequences (sequence quality  = 80%. If the bootstrap value was  = 80) and generated a consensus taxonomy. If two-thirds or more of the full-length sequences shared the same assigned genus, the tag was assigned to that genus. If there was no such agreement, we proceeded up one level to family. If there was a two-thirds or better consensus at the family level, we assigned this taxonomy to the tag, and if not, we continued to proceed up the tree. Occasionally, a tag could not be assigned taxonomic classification at the domain level. This was because the RDP Classifier could not assign a domain with an adequate bootstrap value, rather than a tag mapping to full-length sequences from different domains. These may represent novel organisms whose taxonomy has not yet been determined. Sample tags that did not have a single BLAST match in the RefSSU database also were not given a taxonomic assignment. We chose to use a 66% (two-thirds) majority although other values or a distributional vs. strict percentage approach can be implemented. We reviewed nearly 17 million tags in our sequencing database (primarily of the V6 region) from a wide range of studies using the 66% majority as the threshold for assignment. A distribution curve of voting majority did not show any obvious break points (graph not shown), although 95% of the tags had a voting majority of 75% or better, and 90% had a voting majority > = 83%.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data

            Substantial new features have been implemented at the Ribosomal Database Project in response to the increased importance of high-throughput rRNA sequence analysis in microbial ecology and related disciplines. The most important changes include quality analysis, including chimera detection, for all available rRNA sequences and the introduction of myRDP Space, a new web component designed to help researchers place their own data in context with the RDP's data. In addition, new video tutorials describe how to use RDP features. Details about RDP data and analytical functions can be found at the RDP-II website ().
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The Macaque Gut Microbiome in Health, Lentiviral Infection, and Chronic Enterocolitis

              Introduction The human intestine is home to some 100 trillion microorganisms of at least 400 species. The density of bacterial cells in the colon has been estimated at 1011 to 1012 per ml, which makes it one of the most densely populated microbial habitats known [1,2]. The number of unique genes in the microbial pool is estimated to outnumber the genes in the human nuclear genome by two orders of magnitude [1,2], and these genes contribute many essential metabolic functions to the host. The great majority of gut bacterial species have not been cultured outside the human host and are known only by fragments of their DNA sequences. A few pioneering reports have begun to survey the intestinal microbiota of humans and mice using DNA sequencing of uncultured communities [1,3,4] or using microarray-based methods [5,6]. It is widely expected that human disease states will be linked to characteristic transitions in the intestinal microbiota, and connections have been proposed between GI bacterial communities and obesity [7,8] and Crohn's disease [9,10], but studies in this area are just beginning. Here we report characterization of GI microbial communities in rhesus macaques and their alteration accompanying colitis associated with SIV infection or in animals with chronic enterocolitis. The mammalian GI tract is a major locus of immune tissues responsible for blocking invasion by pathogens, and more recently, these tissues have been implicated in normal homeostasis of the gut microbiota as well. For example, B-cells of the gut associated lymphoid tissues (GALT) synthesize IgA, which is secreted in large amounts into the lumen of the gut, and mice genetically incapable of normal IgA synthesis have abnormally large proportions of anaerobes in the small intestine [11,12]. Secreted antibacterial peptides have also been implicated in regulating the composition of the gut microbiota [13,14]. Effects of host genotype are also documented by the finding that genetically obese mice have detectably different gut microbiota compared to wild-type controls [8]. HIV infection causes rapid and massive destruction of GALT [15–20], and HIV infection is also frequently associated with gastrointestinal disorders, many of which are of unexplained etiology [21]. Destruction of GALT and gastrointestinal disorders are also a well-characterized consequence of simian immunodeficiency virus (SIV) infection in macaques [15,16,22–24]. A role for the GI microbiota in AIDS disease progression has recently been suggested–bacterial antigens are proposed to pass through the damaged GI mucosa and promote immune activation, which in turn promotes viral replication and disease progression [20]. Chronic enterocolitis is fairly common in rhesus macaques even in the absence of SIV infection or other known infectious or parasitic agents. Analysis of the clinical course and histopathology of idiopathic chronic enterocolitis shows many parallels with human inflammatory bowel disease (IBD), and indeed the macaque disease has been studied as a model for the human disorder [24–26]. Evidence of proinflammatory dysfunction of the IL6-JAK-STAT3-SOCS3 pathway has been reported [24]. A role for the gut microbiota in human IBD has long been suspected, and several studies have profiled uncultured GI bacteria from healthy and diseased patients (e. g. [9,27]). Such studies have not yet yielded a clear-cut picture of the relationship of GI microbiota to pathogenesis, though a reduction in microbial diversity has been proposed. Studies of the macaque disease have suggested that several GI microbes may be slightly more common in macaques with an IBD-like disease, but the macaque GI communities have not been comprehensively analyzed [20,25]. Here we characterize the macaque GI microbial communities and compare community composition in health and GI disease. To profile the bacterial taxa present, we purified bacterial DNA from samples of intestinal contents, amplified segments of the 16S rRNA gene, determined the sequences using massively parallel pyrosequencing, then used these data to identify and quantify the types of bacteria present [28]. The approach used here was based on extensive reconstruction studies [29], which showed that known clustering of microbial communities could be recaptured using 16S rRNA gene sequences of lengths generated by pyrosequencing using technology commercialized by 454 Life Sciences [30]. These preliminary bioinformatic studies also disclosed that some short segments of the 16S rRNA gene sequence were especially useful for phylogenetic reconstruction, allowing optimized primers to be chosen for the study reported here. We found that the macaque microbiota was distinct from other vertebrates studied previously. Even in healthy animals the taxa present in the gut microbiota differed between individuals and changed substantially within individuals over time. Unexpectedly, communities from males and females also differed. Distinctive GI microbial communities were also obtained in samples of colonic contents taken at necropsy from animals with GI disease. Most of these animals were also treated with antibiotics to ameliorate their symptoms, so our analysis models human cases of colitis accompanied by antibiotic therapy. These data indicate that colitis and its treatment are associated with transitions in GI microbiota in the macaque, providing a model that may be useful in understanding the human GI microbiota in health and disease. Results Monitoring Macaque Intestinal Microbiota We surveyed a range of sample types and disease states for possible effects on the macaque GI microbiota. We analyzed a total of 100 samples, including healthy animals, SIV infected animals, and animals with chronic enterocolitis. For the colitic samples, some of the animals were SIV infected and had colitis as a result of simian AIDS, while others were colitic but not SIV infected. Sample types included colonic contents collected at necropsy, bacterial communities adhered to biopsy specimens from the upper and lower GI mucosa (jejunum and colon respectively), and stool (Table 1; detailed data for each animal is in Table S1). Table 1 Samples of GI Microbiota Analyzed in This Study DNA was isolated from all 100 samples and amplified by PCR using primers BSR357-A and BSF8-B, which anneal to conserved regions of bacterial 16S rRNA gene. All sequence reads extended from the BSR357-A primer. The median read length was 264 nt (Figure 1A). These primers were chosen based on a series of simulations carried out to investigate the optimal region of the 16S rRNA gene to query using the short sequence reads expected from pyrosequencing. Use of a moderately conserved region yielded relatively stable phylogenetic placements, though at the expense of reduced ability to discriminate low-level taxa. Biased amplification of 16S rRNA gene sequences from mixed bacterial populations can lead to distortions in abundance estimates, but these are typically in the range of only a few fold [6,31–33]. To facilitate comparison among samples, only a single region of the 16S rRNA gene was amplified, and uni-directional reads were used for the analysis, so that any biases introduced during amplification are common among all samples. Figure 1 Use of DNA Bar-coding and Pyrosequencing to Analyze Uncultured Bacterial Communities (A) Length distribution of the pyrosequence reads used in this study. The median length was 264 nt. (B) The DNA bar-coding strategy. PCR primers are indicated by the arrows, DNA 5′ ends are shown as balls. Each primer contains a region complementary to the 454 sequencing primers (either A or B) and the 16S rRNA gene DNA (either BSR357 or BSF8) separated by a unique 4 base bar code (bold). (C) Reproducibility of the pyrosequencing method. DNA from a single specimen was analyzed by pyrosequencing at two different centers. Bacterial orders are indicated by the color code. (D) Comparison of results with pyrosequencing and conventional Sanger sequencing. Bacterial orders are indicated by the color code. Numbers of sequences are as follows: M3T1 pyrosequence, 1382; M3T1 Sanger sequence, 47; M3T6 pyrosequence, 1360; M3T6 Sanger sequence, 47. The primers for the 16S rRNA gene sequences were each marked with a unique DNA bar code by including distinctive 4 base sequences in the primers between the 16S rRNA gene complementary region and the binding sites for the pyrosequencing primers (Figure 1B). This allowed the PCR products from many samples to be sequenced using the 454 Life Sciences [30] technology, then indexed afterwards. After removal of low quality sequences, a total of 140,356 useable sequence reads were obtained. All bar codes were well populated, with an average of 1404 sequences per community tested. The error rate for the bar coding procedure could be estimated by cataloging all those sequence reads with bar codes that were not among those used for labeling. The analysis indicated that only 0.01% of sequences were likely to be miscataloged due to errors parsing the bar codes. One DNA sample was sequenced twice to assess reproducibility. To determine the bacterial taxa present, the 16S rRNA gene sequences were aligned using NAST and GREENGENES and then inserted into pre-established phylogenetic trees of full length 16S rRNA gene sequences [34,35] using ARB. Over all the sequences analyzed in this study, 99.94% sequences aligned with previously determined 16S rRNA gene sequences (80 total sequences failed to align). The bacterial taxa in each sample were then tabulated. Comparison of the two independent sequencing experiments showed excellent reproducibility of the phylogenetic assignments (Figure 1C). Ninety-four near-full length macaque bacterial 16S rRNA gene sequences from two communities were also determined by conventional Sanger sequencing to provide a check on the pyrosequencing data (Figure 1D; Table S2). As can be seen from Figure 1D, the major types and relative numbers of taxa were closely similar in the Sanger and pyrosequencing analysis for each sample, indicating that the pyrosequencing data yielded an accurate reflection of the species detected by conventional sequencing, though the minor taxa detected by pyrosequencing could not be detected in the Sanger reads due to the lower number of the latter. Microbial Diversity in the Macaque Intestinal Microbiota We first investigated the bacterial diversity present in our 16S rRNA gene sequence data. Sequence reads were aligned using NAST and compiled in OTUPicker. When sequences were condensed under conditions demanding 99% identity, about 20,000 different operational taxonomic units (OTUs; groups defined by pairwise sequence identity) were found (Figure 2A). When OTUs were defined using a threshold of 97% identity or greater, a criteria that in previous studies was judged to match roughly the species level [36,37], about 5,000 OTUs were identified. Errors introduced during pyrosequencing may influence this value, but effects are expected to be small (discussed further in the Methods section). In an effort to determine whether all the OTUs present in the data set had been recovered in the pyrosequencing study, a rarefaction analysis was carried out (Figure 2B). Increasingly large random subsets of the initial group of OTUs were analyzed for OTU number, and the totals plotted. If all the OTUs in the sample had been sequenced multiple times, a stable estimate would be reached at OTU values less than the number present in the full data set. As can be seen in Figure 2B, the estimates are still climbing even at the highest numbers of OTUs analyzed, indicating that substantial numbers of unseen OTUs exist in the samples and would only be detected after determining larger numbers of sequences. Figure 2 Diversity of the Macaque GI Microbiota (A) The numbers of operational taxonomic units (OTUs) present in the collection of pyrosequence reads was analyzed by condensing sequences at several percent identity thresholds. The x-axis shows the percent identity, the y-axis the number of OTUs detected. (B) Collectors curves analysis of the completeness of sampling. Repeated samples of OTU subsets were used to evaluate whether further sampling would likely yield additional taxa (rarefaction analysis), as indicated by whether the curve has reached a plateau value. The y-axis indicates the number of OTUs detected, the x-axis the number of taxa in the sequence subset analyzed. The color codes are as follows: green, stool samples; yellow, colonic contents; red, lower GI mucosal surface; blue, upper GI mucosal surface. (C) Rarefaction curves to estimate the diversity of taxa present in individual samples, using the Shannon Index. Color code as in (B). The upper GI mucosal samples were significantly less diverse than the other groups (p < 0.004 for pairwise comparisons of upper gut samples to each of the other three; Mann-Whitney comparison of means). In an attempt to estimate the total number of OTUs in each data set, the Chao 1 estimator was used, which uses frequency of isolation information to estimate the number of unseen OTUs present in the original sample. For most of the samples, the rarefaction curves on the Chao 1 estimates did not reached a stable value, indicating that the true numbers of OTUs in the samples are larger even than the Chao 1 estimates (53–1185 OTUs per sample; 97% identity criteria). Overall the richness of the bacterial taxa in the macaque GI microbiota was very high. A comparison of the estimated diversity in all 100 samples was carried out by computing the Shannon Diversity Index from the OTU data for each sample (Figure 2C). To investigate the relative diversity at different anatomical sites, the 100 communities were grouped by sample type and their relative diversity compared. Rarefaction analysis indicated that most of the Shannon Diversity estimates had reached stable values. Separating the communities by sample type indicated that the upper GI mucosal samples from the jejunum were notably less diverse than the other groups. Comparison of the Macaque, Human, and Mouse GI Microbiota A comparison of the macaque GI microbiota to that of humans [4] and mice [8] is shown in Figure 3. To compare the global compositions of microbial communities, we used UniFrac [38–40], which measures the similarity among bacterial communities based on phylogenetic distances. To carry out a UniFrac analysis, we used the augmented ARB database described above. To compare two communities using UniFrac, sequences from the two communities are marked on a common phylogenetic tree that contains all the sequences from the communities to be analyzed, and the fraction of the branch length on the tree unique to each community is then computed. This procedure provides a measure of the similarity between the two communities in terms of the total amount of evolutionary history that separates the sequences in the two communities. UniFrac assigns only a small difference to changes in representation of closely-related taxa, but larger value for changes in representation of more distant taxa, in contrast to OTU-based methods that assume that all taxa are equally distinct. To compare multiple communities, all the pair-wise distances between communities were computed, then Principal Coordinate Analysis (PCoA) was used to cluster the communities along axes of maximal variance (Figure 3). Figure 3 Comparison of the Macaque GI Microbiota to That of Mice and Humans The plot was generated using unweighted UniFrac. Mouse and human sequences were trimmed to match the macaque pyrosequence reads in length (264 nt) and location within the 16S rRNA gene. The differences among communities from the different vertebrates was significant at p < 0.001 (t-test with permutation). To compare human and mouse samples to the macaque pyrosequencing data, sequence reads determined by the Sanger method from human and mouse were first truncated to match the length and position of the macaque 16S rRNA gene sequences. The UniFrac comparison showed strong clustering by species of origin. Similar separation by species was obtained when pyrosequencing data was used for both the rhesus and murine samples (unpublished data). For the human and macaque samples, the communities clustered by species of origin even though samples from diverse anatomical sites were included for each species. The taxonomic groups from GI communities of each species were then compared. The bacterial taxa detected are summarized in Figure 4A. The most prominent bacterial classes were Clostridia (Phylum Firmicutes), Bacteroidetes (Phylum Bacteroidetes) and Spirochaetes (Phylum Spirochaetes). Present in lesser amounts are Bacilli and Molicutes (Phylum Firmicutes), Alpha, Beta, Gamma, and Epsilon Proteobacteria, and a collection of additional classes. Several of the minor classes were found repeatedly in specific individual macaques (e. g. Fibrobacteres, Gemmatimonadetes, Deferribacteres). All of the animals showed variation over time, in both the classes detected and in their relative abundance. Many of the bacterial taxa identified were not previously known to be present in the macaque intestinal microbiota. Figure 4 Bacteria Composing the Macaque GI Microbiome (A) Bacterial taxa identified from pyrosequencing data after alignment with the ARB 16S rRNA gene database. The size of each triangle indicates the relative number of OTUs within each taxa (100% identify threshold). (B) Summary of the bacterial taxa present in each gut community sampled, indicating the individual and temporal variation in the macaque GI microbiota. Each sample analyzed is indicated along the x-axis, the y-axis indicates the percent of the community comprised by each type of bacteria. A key to the bacterial taxa is listed at the right. Taxa corresponding to bacterial phyla are indicated with the triple underscore before the name, classes by a double underscore, orders by single underscores, and families by no underscore. Specific values for each community, along with clinical parameters for each monkey studied are summarized in Table S1. The predominance of the phyla Firmicutes and Bacteroidetes were similar in all three vertebrates, and several lower-abundance phyla also overlapped. For example, Proteobacteria and Actinobacteria were found in both macaques and humans. Verrucomicrobia were detected in humans but were rare macaques. A distinctive feature of the macaques was the density of Phyla Spirochaetes, particularly members of the genus Treponema, which were present in abundance in macaques (Figure 4) but mostly absent in the samples from in mice and humans. The abundance of flagellated Helicobater (EpsilonProteobacteria) has previously been noted [41], and Spirochaetes have been identified in the gut microbiota of many vertebrates including humans and non-human primates [42,43]. However, the abundance of Treponema in macaques was unexpected and far greater than in human. In humans, within the Class Bacteroidetes, members of the genus Bacteroides have been reported to be a major and functionally significant component of the human intestinal microbiota [4,44,45], but of the 94 near full length 16S rRNA gene macaque sequences, only one was genus Bacteroides. More common were genus Prevotella (16/94 sequences), which is also common in humans, and Rikenella (18/94 sequences), which is rare or absent in humans [4]. These proportions of Bacteroidaceae and Prevotellaceae were similar in the shorter pyrosequencing reads. In macaques, comparison of microbial communities among animals showed considerable variation among individuals, both in the relative abundance of the major taxonomic groups and in the presence of minor groups (Figure 4B). For some animals, longitudinal samples were available, showing that the composition of the GI microbiota was quite dynamic over the period of sampling. Distinctive Microbial Communities Associated with Different Anatomical Sites Figure 5 shows a UniFrac clustering diagram comparing the communities from different anatomical sites. Possible clustering by sample type on the first two principal coordinates was assessed using a t-test to compare the within-group and between-group distances, then 1,000 label permutations were used to assess significance. Clustering for all four sample types was found to be significant at the p < 0.01 level. Figure 5 Distinctive GI Microbiota in Samples from Communities from Different Anatomical Sites Unweighted UniFrac was used in the comparison. The types of samples studied are indicated by the key at lower right. The samples from the upper GI mucosa formed a distinct cluster to the upper right of the diagram, indicating unique composition. Analysis of the taxa present indicates that the upper GI communities were depleted in bacteria from the Bacteroidetes and Clostridia classes compared to lower GI, colonic contents, or stool, and enriched in Baccili, Molicutes, and Gamma and EpsilonProteobacteria. Several minor groups were particularly common in upper GI samples, including Mycoplasmatales and Streptococaceae. Analyses of biopsies (with adherent bacteria) from the lower GI (ascending colon) showed that they intermingle with samples of colonic contents taken at necropsy, though a distinctive feature was the abundance of Helicobater at this site. Enterobacteriaceae were far more common in the upper and lower GI samples than in stool or colonic contents, indicative of probable adherence to mucosal surfaces. Stool samples form a cluster continuous with colonic contents but extending to the upper left of the UniFrac plot. Stool samples commonly differed from colonic contents samples by having greater representation of Spirochaetes and several minor groups. Distinctive Microbial Communities Associated with Sex of the Animal of Origin In an effort to identify additional parameters affecting the macaque GI microbiota, we asked whether communities clustered detectably in UniFrac analysis when partitioned by a variety of biological parameters. The parameters tested included sex of the animal of origin, age, disease state, antibiotic use, and viral infection. GI communities were analyzed as pools across all sample types, as pools of related samples (colonic contents plus stool), or as single sample types (stool only or colonic contents only). Unweighted UniFrac was used for these comparisons, which is based on the presence or absence of different taxa without regard to abundance. In samples of colonic contents taken at necropsy, or in samples of stool, a difference was seen between males and females. Separate clustering is illustrated for a pool of the two sample types in Figure 6 (p < 0.05 by t-test and label permutation). Analysis of the bacterial groups involved showed that several groups of the Lachnospiracea and Bacteroidales differed (p < 0.0001). One Treponema group was far more common in males (p < 0.0001). The physiological mechanism for the observed sexual dimorphism is unknown, though partitioning of the GI microbiota by sex has been noted in mice [46]. Figure 6 Sexual Dimorphism in the Macaque GI Microbiota Samples of stool and colonic contents are combined for this analysis. Cluster analysis was carried out using unweighted UniFrac. Separation between male communities (green) and female communities (pink) was significant (p < 0.05, t-test with permutation; analysis over all variation between samples). Note that with the simplest null model, we expect each Principal Coordinate to explain 100/number of samples, which is 100/100 communities = 1% of the variation. Thus the fourth Principal Coordinate, which separates males and females, is expected to contain meaningful information. Altered Bacterial Taxa in Animals with Colitis The effects of disease states were then examined. Microbial communities from colonic contents were divided by whether host animals were diagnosed with colitis at necropsy (Table 1) and analyzed in unweighted UniFrac (Figure 7A). Seven samples were available for analysis from males and ten from females. Of these, nine were SIV-infected and eight were uninfected. Figure 7 Colitis Is Associated with Distinctive GI Microbiota in Samples of Colonic Contents Taken at Necropsy The analysis was restricted to samples of colonic contents taken at necropsy that allowed unambiguous assignment to the “colitis” or “healthy” categories. (A) Analysis of communities in unweighted UniFrac. Samples in the colitis and healthy categories showed significant separation along the first principal coordinate (p < 0.05, t-test with permutation). For an additional four animals, insufficient clinical histories were available, so these were not included in the analysis (Table 1). (B) Diversity in samples from healthy animals or those with colitis were analyzed using the Shannon Index on OTUs condensed at 97% identity. The diversity in the samples from animals with colitis was significantly lower (p < 0.05; Mann-Whitney comparison of means). The communities separated along the first principal coordinate by whether the animals were diagnosed with colitis (p < 0.05; t-test with label permutation), indicating that the disease and associated treatment resulted in a change in composition of the GI microbiota. An analysis of the relative diversity, as reported by the Shannon Index, revealed that diversity was consistently lower in the communities from colitic animals (Figure 7B). Most of the animals with colitis had a history of multiple bouts of diarrhea requiring medical attention including fluid therapy and in many animals treatment with antibiotics (Table S1). The antibiotics chosen for therapy differed among the animals and included tetracycline, enrofloxacin, cefazolin, and tylosin. The time of treatment relative to euthanasia and the duration of treatment also varied. Only two animals were on antibiotics (tetracycline) at the time of euthanasia. Within the cluster of communities from animals with colitis (Figure 7B), some possible sub-clustering was seen by antibiotic type, suggesting that each antibiotic resulted in characteristic changes in community composition (though larger sample sizes will be needed to assess this hypothesis definitively). An analysis of the bacterial taxa that differed between the two groups revealed the family Campylobacteraceae (Epsilon-Proteobacteria) was much more common in animals with colitis–for the major Campylobacter OTU (97% criteria), five out of ten monkeys with colitis had this OTU, but none of the seven healthy monkeys had this OTU (G = 6.03, df = 1, p = 0.015). Two monkeys of unknown clinical status were also positive. A variety of additional taxa within the Bacteroidetes and Firmicutes phyla also changed in abundance significantly in assocation with colitis. The Campylobacter genus contains known enteric pathogens of humans ([47] and references therein), consistent with the idea that the presence of these groups was associated with pathogenesis in macaques. Of the animals detected as Campylobacter positive by sequence analysis, only two animals had positive cultures for Campylobacter when analyzed by conventional clinical methods. One explanation for the enrichment of Campylobacter would be that antibiotic treatment created an environment favorable for colonization, as has been suggested for Clostridia difficile in humans. Of the animals with colitis that were positive for Campylobacter, four had histories of recent antibiotic use but three did not, and for the four treated animals three different antibiotics were used (Table S1). Thus the presence of Campylobacter was not strongly associated with antibiotic treatment, consistent with the idea that Campylobacter was associated with colitis and not antibiotic use. SIV-infected animals were present in both the colitis and normal groups, and no strong clustering of the bacterial communities was associated with SIV infection when SIV infection was analyzed in isolation (data not shown). These data suggest that the alterations in community composition in SIV-infected animals with colitis was attributable to the colitis resulting from viral infection, and not the viral infection itself. Discussion In this study, we describe the composition of 100 uncultured GI microbial communities from healthy rhesus macaques and macaques with chronic colitis. Each community was characterized by an average of ∼1,400 reads of 16S rRNA gene of median 264 nt in length. This work provides a detailed picture of the structure of the macaque GI microbiota, its dynamics, and changes associated with colitis with or without SIV infection. Macaque models are used in studying myriad GI diseases, including SIV-induced enteropathy, bacterial enteropathy, and inflammatory bowel disease. The data presented here provides detailed background, hypotheses and methods for assessing possible involvement of the full GI microbiota, and provides a model for investigating changes in the human GI microbiota in healthy and diseased individuals. The pyrosequencing method [30] allows large numbers of 16S rRNA gene sequence reads to be obtained while controlling the costs of data acquisition, greatly increasing the number of bacterial communities and species accessible to analysis compared to culture-based methods. In the bioinformatic approach used here, the pyrosequencing reads were analyzed after first inserting them into pre-existing phylogenetic trees formed from full-length 16S rRNA gene sequences, allowing relatively accurate phylogenetic placement despite the short sequences lengths [29]. Aligning pyrosequencing reads to a pre-existing tree also serves to minimize the effects of pyrosequencing errors, since single nucleotide substitutions that cause a sequence read to align with an incorrect full length sequence will be rare. Communities characterized by 16S rRNA gene sequence reads were compared to each other using UniFrac [38,40], which evaluates the distance between pairs of samples after alignment on phylogenetic trees based on the unique branch length leading to members of each community. One advantage of this approach is that the collection of pair-wise distances between communities can be subject to PCoA, allowing communities to be clustered along orthogonal axes of maximal variance. In a successful study of this type, clustering on each axis can report the effects of different biological variables. Previous studies of the vertebrate GI microbiota have indicated that many factors influence microbial populations, including host genotype [8,48], geography [49], antibiotic use [50], and diet [51]. Using UniFrac and PCoA, in combination with case-controlled samples, it is potentially possible to extract the effects of these and other variables and analyze each independently. Our analysis showed that the macaque microbiota differed significantly from that of mouse or human. Even when communities from different anatomical sites were considered, or when samples from healthy hosts were mixed with diseased hosts, the effect of species of origin was still predominant. For all three vertebrates, the Firmicutes and Bacteroidetes comprised the most abundant phyla, but the composition of minor groups differed and the taxa within the Firmicutes and Bacteroidetes also differed. A distinctive feature of the macaque samples was the abundance of Spirochaetes from the Treponema lineage. These Treponema differ from the spiral-shaped Helicobacter reported previously [41], which were also detected here. Analysis of full-length 16S rRNA gene clones (Figure 1D) showed closest matches to Treponema brennaborense and Treponema saccharophilum. T. brennaborense has been associated with digital dermatitis in dairy cows [52]. T. saccharophilum has been identified as a component of the rumen GI flora that aids in digestion of pectin [53], suggesting a possible role in digesting vegetable matter in the macaque GI tract. The analysis of healthy animals emphasized the many factors affecting composition of GI communities in macaques. The number of types of bacteria involved is very large–when macaque 16S rRNA gene sequences are grouped into OTUs at 97% or greater similarity, a threshold that has been suggested to correspond roughly to the species level, about 5,000 OTUs were identified. Microbial communities of individual animals differed from one another, and all animals followed longitudinally showed changes in community composition over time. Similarly in humans, GI microbial communities have been reported to differ among individuals and at different anatomical sites [4]. The macaque GI communities also clustered by the sex of the host animal, paralleling a proposal for sexual dimorphism in the GI microbiota in mice [46]. Samples from colonic contents of animals euthanized due to advanced colitis showed distinctive communities compared to similar samples from healthy controls, linking alterations in the GI microbial communities and GI pathogenesis. Samples from animals with colitis, whether associated with SIV infection or not, were indistinguishable. This emphasized that colitis itself (and associated therapeutic interventions) and not the cause of colitis was most tightly linked with altered GI microbiota. The presence of Campylobacter was strongly associated with colitis. The major Campylobacter OTU (97% threshold) was present in five out of ten animals with colitis, but in zero out of seven free of colitis (p = 0.014). Cultureable C. jejuni or C. coli were obtained only from two animals, indicating that the Campylobacter species detected were either too rare to detect by culture, or did not grow under the culture conditions used. Most of the macaques euthanized due to GI-disease were treated with antibiotics at some point during disease progression. Thus these findings model human clinical cases where antibiotic therapy can be indicated in the treatment of colitis, but antibiotic treatment complicates analysis of effects of GI disease alone. For the samples of colonic contents taken at necropsy, there were indications of clustering due to type of antibiotic used for treatment within the larger cluster of samples from animals with colitis, though the number of samples in each antibiotic group was too low for detailed analysis by antibiotic type. Our data are consistent with the idea that the disease state caused a shift in bacterial communities that was further shaped by the antibiotics used for treatment. The sequence-based approach described here has the potential to identify candidate pathogens involved in previously obscure disease conditions. Animal FH09 (Table S1) provides a case study. This animal suffered from prolonged chronic diarhhea of unknown cause. Exhaustive searches for a microbial pathogen by conventional culturing methods were negative. For unexplained reasons, placing the animal on a gluten-free diet helped ameliorate the condition, but eventually the animal declined and was euthanized for humanitarian reasons. Analysis of colonic contents taken at necropsy revealed a substantial number of 16S rRNA gene sequences (51 reads) that clustered with a group containing Campylobacter fetus and Campylobacter hyointestinalis. Evidently Campylobacters of this group are not detected in the usual culture assays. C. fetus has been implicated as an emerging pathogen and could well have been involved in the GI disease of FH09. These findings suggest that further analysis of the relationship between diet and C. fetus pathogenesis might be useful, and illustrate how the methods described here could be applied in diagnosis of human GI diseases of unknown etiology. In summary, this study presents the first use of DNA bar coding and pyrosequencing to analyze uncultured bacterial communities from the primate gut, and provides the deepest view into the gut microbiome from the largest sample of any non-human species to date. Using the macaque model and the methods reported here, it will be possible to investigate how the interaction among bacterial community members, together with alterations in the GI environment, leads to outgrowth of pathogenic forms and resultant disease. This study also paves the way for broader application of pyrosequencing to characterize the human microbiota in health and disease, which could potentially allow large-scale characterization of thousands of human samples with orders of magnitude less expense and effort than traditional Sanger sequencing. We will thus soon be able to identify those features of the microbiota (if any) that are common to all healthy individuals, and to assess the extent to which changes in the microbiota in animal models can help guide the development of therapy for human diseases. Materials and Methods Sample collection. Rhesus macaques (Macaca mulatta) were housed singly at the Tulane National Primate Research Center. For longitudinal studies of stool samples, four animals (CC47, FH40, CT64, DD05; here M1-M4) were infected intravenously with 100 TCID50 SIVmac251 on study day 0. Fecal samples were collected prior to infection (t1), at day 7 (t3), day 14 (t4), day 28 (t6) and day 56 (t10) post infection. These are standard time points for examination of early events in the pathogenesis of AIDS and are associated with peak viremia (day 14) and establishment of viral set point (by day 56). Stool samples for control animals (AM87, DG23, CC79, BA02; here C1-C4) were collected similarly over an eight week period. For samples of colonic contents, each was collected from the ascending colon at necropsy within one hour of euthanasia with an intravenous overdose of phenobarbital. All samples were immediately frozen to −80 °C. Samples were shipped on dry ice and stored at −80 °C until processing. In addition, intestinal biopsies of the upper (jejunum) and lower (ascending colon) were obtained by standard techniques. These biopsies were immediately frozen as for colonic contents. Housing and handling of animals were in accordance with the Guide for the Care and Use of Laboratory Animals (U.S. Public Health Service) and the Animal Welfare Act. All protocols and procedures were reviewed and approved by the Tulane University Institutional Animal Care and Use Committee. Additional animals studied, their clinical conditions, and detailed ecological descriptions of samples are in Table S1. Extraction and purification of DNA. Total DNA was extracted from frozen stool using the QIAamp® DNA Stool Mini Kit (Qiagen, Inc., Valencia CA), following the manufacturer's protocol for pathogen detection. PCR amplification of bacterial 16S rRNA gene sequences. For samples from each animal and at each time-point, the 16S rRNA gene was amplified from extracted DNA using the composite forward primer 5′-GCCTCCCTCGCGCCATCAGNNNNCTGCTGCCTYCCGTA-3′ where the underlined sequence is that of 454 Life Sciences® primer A and in italics is the broad range bacterial primer BSR357. The reverse primer was 5′-GCCTTGCCAGCCCGCTCAGNNNN AGAGTTTGATCCTGGCTCAG-′3, where the underlined sequence is that of 454 Life Sciences® primer B and in italics is the broad range bacterial primer BSF8. The NNNN designates the unique four base bar code used to tag each PCR product. Reaction conditions were as follows: 5.0 μl 10× PCR buffer II (Applied Biosystems, Foster City, CA), 3.0 μl MgCl2 (25 mM; Applied Biosystems), 2.5 μl Triton X-100 (1%), 2.0 μl deoxyribonucleoside triphosphates (10 mM), 1.0 μl forward primer and 1.0 μl reverse primer (20 pmol/μl each) and 0.5 μl AmpliTaq® DNA polymerase (5U/μl; Applied Biosystems) and 100 ng of template DNA in a total reaction volume of 50 μl. Reactions were run in a GeneAmp® PCR System 9700 cycler (Applied Biosystems) using the following cycling parameters: 5 minutes denaturing at 95 °C followed by 20 cycles of 30 secs at 95 °C (denaturing), 30 secs at 56 °C (annealing) and 90 secs at 72 °C (elongation), with a final extension at 72 °C for 7 minutes. Four independent PCR reactions were performed for each sample along with a no template negative control. Gel purification and pyrosequencing. Each PCR product was gel purified from a 0.8% agarose gel. DNA was isolated using the QIAquick® Gel extraction kit (Qiagen, Inc., Valencia CA). 100 ng of each of the 100 gel purified DNAs was added to a master pool of DNA which was sent for pyrosequencing with primer A as described [30,54]. Several studies have analyzed sources of error in 454 sequencing runs, which informed our choices for quality control here [37,54,55]. For a sequence to pass quality control, it needed to (1) show a perfect match to the bar code and 16S rRNA gene primer, (2) be at least 50 nt in length, (3) have no more than two undetermined bases in the sequence read, and (4) find at least a 75% match to a previously determined 16S rRNA gene sequence after alignment with NAST (http://greengenes.lbl.gov/). The sequences were inserted into the 16S rRNA gene tree constructed by Hugenholz et al. [56] using the parsimony insertion tool from ARB software (http://www.arb-home.de/). A “termini” filter was used for the parsimony insertion. After applying this criteria, 36,652,141 bases of sequence were available for analysis. All sequence data will be deposited at NCBI upon acceptance of this manuscript for publication. Bioinformatic analysis. OTU clustering and analysis was carried out using OTUPicker (M. Hamady and R. Knight, unpublished). Clustering and principal coordinate analysis were conducted using UniFrac [29,38,39]. UniFrac analysis can be carried out based on the presence and absence of bacterial taxa (unweighted UniFrac), or taking into account abundance information on each group (weighted UniFrac); Figures 3, 4, 6, and 7 report unweighted UniFrac results. To perform permutation tests within UniFrac, we randomized the labels of each group and repeated the cluster analysis. We then compared all distances between points that both came from the same group to all distances between points that came from different groups using a t-test. In the permutation test, we obtained a nonparametric distribution for the t statistic that takes into account the correlations introduced by the distance matrix structure. We used 1,000 permutations, so we cannot specify p-value more precisely than “<0.001” if none of the permuted sets gave a more extreme result than the actual set. We note that the principal coordinate analysis assumes that the relationships between taxon abundance and environmental gradients is linear. In choosing the Monte Carlo methods used for significance testing, we accepted reduced power to avoid using parametric methods, which assume random distribution in the error terms. The taxonomy assignments were based on the group names in Arb. Ecological parameters in Table S1 were calculated using OTUPicker and PAST [57]. Errors in pyrosequencing may occur at a rate of about 0.25% [37], suggesting that the most of the 260-nucleotide sequences that remain after filtering will contain either 0 or 1 errors. Single- nucleotide errors will not affect either of the analyses we present (high-level taxonomic breakdowns or UniFrac) substantially, as they are unlikely to cause assignment of pyrosequence reads to the wrong taxonomic group and contribute almost no branch length to the phylogenetic tree used for UniFrac analyses. However, these sequencing errors could affect estimates of the total number of OTUs at a given threshold, so some caution in interpreting the total number of species-level taxa in the samples is required. Using the Poisson model, we would expect only 4.4 × 10−5% of the reads to contain the seven errors that would be required to form a new species-level at the 97% OTU threshold. Thus, it is unlikely that a single OTU in the analysis was generated through that mechanism. Supporting Information Table S1 Characteristics of Samples of Uncultured Macaque GI Communities Used in This Study This table provides the description of monkeys sampled, clinical parameters for disease states, and ecological statistics describing the communities sampled. (93 KB XLS) Click here for additional data file. Table S2 Near-full-length 16S rRNA Gene Sequences Determined by the Sanger Method, and Their Taxonomic Positions These sequences allow finer discrimination of the major macaque GI bacterial taxa. (404 KB XLS) Click here for additional data file.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2009
                23 September 2009
                : 4
                : 9
                : e7125
                Affiliations
                [1 ]URMITE - UMR CNRS 6236, IRD 3R198, Université de la Méditerranée, Faculté de médecine, Marseille, France
                [2 ]Service de Nutrition, Maladies Métaboliques et Endocrinologie, UMR Université méditerranée-INRA U1260, CHU de la Timone, Marseille, France
                [3 ]Nutrition and Diabetology Department, University Hospital Sainte Marguerite, Marseille, France
                Columbia University, United States of America
                Author notes

                Conceived and designed the experiments: DR. Performed the experiments: MH BV DR. Analyzed the data: FA. Wrote the paper: FA MH BV DR.

                Article
                09-PONE-RA-11126R1
                10.1371/journal.pone.0007125
                2742902
                19774074
                b5f1c082-5abd-4873-b8c7-186847fed07b
                Armougom et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 15 June 2009
                : 28 July 2009
                Page count
                Pages: 8
                Categories
                Research Article
                Microbiology/Environmental Microbiology
                Molecular Biology/Bioinformatics
                Gastroenterology and Hepatology/Gastrointestinal Infections

                Uncategorized
                Uncategorized

                Comments

                Comment on this article