48
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Microbial species delineation using whole genome sequences

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Database resources of the National Center for Biotechnology Information

          In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            GenBank

            GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 300 000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank® staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Accurate and universal delineation of prokaryotic species.

              The exponentially increasing number of sequenced genomes necessitates fast, accurate, universally applicable and automated approaches for the delineation of prokaryotic species. We developed specI (species identification tool; http://www.bork.embl.de/software/specI/), a method to group organisms into species clusters based on 40 universal, single-copy phylogenetic marker genes. Applied to 3,496 prokaryotic genomes, specI identified 1,753 species clusters. Of 314 discrepancies with a widely used taxonomic classification, >62% were resolved by literature support.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                18 August 2015
                06 July 2015
                06 July 2015
                : 43
                : 14
                : 6761-6771
                Affiliations
                [1 ]Microbial and Metagenome Superprogram, DOE Joint Genomic Institute, Walnut Creek, CA 94598, USA
                [2 ]Department of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0355, USA
                [3 ]Celgene Corp., San Francisco, CA 94158, USA
                Author notes
                [* ]To whom correspondence should be addressed. Tel: +1 925 296 5696; Fax: +1 925 296 5666; Email: njvarghese@ 123456lbl.gov
                Correspondence may also be addressed to Amrita Pati. Tel: +1 925 927 2580; Fax: +1 925 296 5666; Email: apati@ 123456lbl.gov
                Correspondence may also be addressed to Nikos C. Kyrpides. Tel: +925 296 5718; Fax: +1 925 296 5666; Email: nckyrpides@ 123456lbl.gov
                Article
                10.1093/nar/gkv657
                4538840
                26150420
                8a92d185-3789-45e2-8b4b-92e164ec8a42
                © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 12 June 2015
                : 08 June 2015
                : 10 December 2014
                Page count
                Pages: 11
                Categories
                7
                Computational Biology
                Custom metadata
                18 August 2015

                Genetics
                Genetics

                Comments

                Comment on this article