• Record: found
  • Abstract: found
  • Article: found
Is Open Access

An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from

      Related collections

      Most cited references 27

      • Record: found
      • Abstract: found
      • Article: not found

      Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.

      The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (> or = 95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and V4 variable regions giving the lowest error rates. The RDP Classifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences. Another related tool, RDP Library Compare, was developed to facilitate microbial-community comparison based on 16S rRNA gene sequence libraries. It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at
        • Record: found
        • Abstract: found
        • Article: not found

        Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB.

        A 16S rRNA gene database ( addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.
          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

          Background We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the “CAT” approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. Conclusions/Significance FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at

            Author and article information

            [1 ]simpleDepartment of Chemistry & Biochemistry and Biofrontiers Institute, University of Colorado , Boulder, CO, USA
            [2 ]simpleLawrence Berkeley National Laboratory, Physical Biosciences Division , Berkeley, CA, USA
            [3 ]simpleJanelia Farm Research Campus, Howard Hughes Medical Institute , Ashburn, VA, USA
            [4 ]simpleLawrence Berkeley National Laboratory, Center for Environmental Biotechnology , Berkeley, CA, USA
            [5 ]simpleDepartment of Bioinformatics, Second Genome Inc. , San Bruno, CA, USA
            [6 ]simpleHoward Hughes Medical Institute , Boulder, CO, USA
            [7 ]simpleAustralian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience , St Lucia, Queensland, Australia
            Author notes
            [* ]simpleAustralian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience, University of Queensland , Molecular Biosciences Building 76, St Lucia, Queensland, 4072, Australia. E-mail: p.hugenholtz@

            Current address: Department of Biology, Cornell University, Ithaca, NY, USA.


            Current address: Institute for Microbiology and Archaea Centre, University of Regensburg, Regensburg, Germany.

            ISME J
            The ISME Journal
            Nature Publishing Group
            March 2012
            01 December 2011
            1 March 2012
            : 6
            : 3
            : 610-618
            Copyright © 2012 International Society for Microbial Ecology

            This work is licensed under the Creative Commons Attribution-NonCommercial-Share Alike 3.0 Unported License. To view a copy of this license, visit

            Original Article

            Microbiology & Virology

            taxonomy, evolution, phylogenetics


            Comment on this article