76
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          New microbial genomes are constantly being sequenced, and it is crucial to accurately determine their taxonomic identities and evolutionary relationships. Here we report PhyloPhlAn, a new method to assign microbial phylogeny and putative taxonomy using >400 proteins optimized from among 3,737 genomes. This method measures the sequence diversity of all clades, classifies genomes from deep-branching candidate divisions through closely-related subspecies, and improves consistency between phylogenetic and taxonomic groupings. PhyloPhlAn improved taxonomic accuracy for existing and newly-sequenced genomes, detecting 157 erroneous labels, correcting 46, and placing or refining 130 new genomes. We provide examples of accurate classifications from subspecies ( Sulfolobus spp.) to phyla, and of preliminary rooting of deep-branching candidate divisions, including consistent statistical support for Caldiserica (formerly candidate division OP5). PhyloPhlAn will thus be useful for both phylogenetic assessment and taxonomic quality control of newly-sequenced genomes. The final phylogenies, conserved protein sequences, and open-source implementation are available online.

          Related collections

          Most cited references41

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          NCBI Reference Sequences: current status, policy and new initiatives

          NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 106 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Horizontal gene transfer, genome innovation and evolution.

            To what extent is the tree of life the best representation of the evolutionary history of microorganisms? Recent work has shown that, among sets of prokaryotic genomes in which most homologous genes show extremely low sequence divergence, gene content can vary enormously, implying that those genes that are variably present or absent are frequently horizontally transferred. Traditionally, successful horizontal gene transfer was assumed to provide a selective advantage to either the host or the gene itself, but could horizontally transferred genes be neutral or nearly neutral? We suggest that for many prokaryotes, the boundaries between species are fuzzy, and therefore the principles of population genetics must be broadened so that they can be applied to higher taxonomic categories.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains.

              The signing authors together with the journal Systematic and Applied Microbiology (SAM) have started an ambitious project that has been conceived to provide a useful tool especially for the scientific microbial taxonomist community. The aim of what we have called "The All-Species Living Tree" is to reconstruct a single 16S rRNA tree harboring all sequenced type strains of the hitherto classified species of Archaea and Bacteria. This tree is to be regularly updated by adding the species with validly published names that appear monthly in the Validation and Notification lists of the International Journal of Systematic and Evolutionary Microbiology. For this purpose, the SAM executive editors, together with the responsible teams of the ARB, SILVA, and LPSN projects (www.arb-home.de, www.arb-silva.de, and www.bacterio.cict.fr, respectively), have prepared a 16S rRNA database containing over 6700 sequences, each of which represents a single type strain of a classified species up to 31 December 2007. The selection of sequences had to be undertaken manually due to a high error rate in the names and information fields provided for the publicly deposited entries. In addition, from among the often occurring multiple entries for a single type strain, the best-quality sequence was selected for the project. The living tree database that SAM now provides contains corrected entries and the best-quality sequences with a manually checked alignment. The tree reconstruction has been performed by using the maximum likelihood algorithm RAxML. The tree provided in the first release is a result of the calculation of a single dataset containing 9975 single entries, 6728 corresponding to type strain gene sequences, as well as 3247 additional high-fquality sequences to give robustness to the reconstruction. Trees are dynamic structures that change on the basis of the quality and availability of the data used for their calculation. Therefore, the addition of new type strain sequences in further subsequent releases may help to resolve certain branching orders that appear ambiguous in this first release. On the web sites: www.elsevier.de/syapm and www.arb-silva.de/living-tree, the All-Species Living Tree team will release a regularly updated database compatible with the ARB software environment containing the whole 16S rRNA dataset used to reconstruct "The All-Species Living Tree". As a result, the latest reconstructed phylogeny will be provided. In addition to the ARB file, a readable multi-FASTA universal sequence editor file with the complete alignment will be provided for those not using ARB. There is also a complete set of supplementary tables and figures illustrating the selection procedure and its outcome. It is expected that the All-Species Living Tree will help to improve future classification efforts by simplifying the selection of the correct type strain sequences. For queries, information updates, remarks on the dataset or tree reconstructions shown, a contact email address has been created (living-tree@arb-silva.de). This provides an entry point for anyone from the scientific community to provide additional input for the construction and improvement of the first tree compiling all sequenced type strains of all prokaryotic species for which names had been validly published.
                Bookmark

                Author and article information

                Journal
                101528555
                37539
                Nat Commun
                Nat Commun
                Nature communications
                2041-1723
                15 August 2013
                14 August 2013
                14 February 2014
                : 4
                : 2304
                Affiliations
                [1 ]Biostatistics Department, Harvard School of Public Health, 655 Huntington Avenue, 02115, Boston, MA
                [3 ]Broad Institute of Harvard and MIT, 301 Binney Street, 02142 Cambridge, MA
                Author notes
                Corresponding author: Curtis Huttenhower, chuttenh@ 123456hsph.harvard.edu
                [2]

                Current address: Centre for Integrative Biology, University of Trento, Via Sommarive 14, 38123, Trento, Italy

                Article
                NIHMS505620
                10.1038/ncomms3304
                3760377
                23942190
                3a39d3d7-f2c2-4b33-82b7-5d547e2b1721

                Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

                Funding
                Funded by: National Human Genome Research Institute : NHGRI
                Award ID: R01 HG005969 || HG
                Categories
                Article

                Uncategorized
                Uncategorized

                Comments

                Comment on this article