10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      EnteroBase: hierarchical clustering of 100 000s of bacterial genomes into species/subspecies and populations

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The definition of bacterial species is traditionally a taxonomic issue while bacterial populations are identified by population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes). However, these approaches are not adequate for the millions of genomic sequences from bacterial pathogens that have been generated since 2012. EnteroBase ( http://enterobase.warwick.ac.uk) automatically clusters core genome MLST allelic profiles into hierarchical clusters (HierCC) after assembling annotated draft genomes from short-read sequences. HierCC clusters span core sequence diversity from the species level down to individual transmission chains. Here we evaluate HierCC's ability to correctly assign 100 000s of genomes to the species/subspecies and population levels for Salmonella, Escherichia, Clostridoides, Yersinia, Vibrio and Streptococcus. HierCC assignments were more consistent with maximum-likelihood super-trees of core SNPs or presence/absence of accessory genes than classical taxonomic assignments or 95% ANI. However, neither HierCC nor ANI were uniformly consistent with classical taxonomy of Streptococcus. HierCC was also consistent with legacy eBGs/ST Complexes in Salmonella or Escherichia and with O serogroups in Salmonella. Thus, EnteroBase HierCC supports the automated identification of and assignment to species/subspecies and populations for multiple genera.

          This article is part of a discussion meeting issue ‘Genomic population structures of microbial pathogens’.

          Related collections

          Most cited references117

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries

          A fundamental question in microbiology is whether there is continuum of genetic diversity among genomes, or clear species boundaries prevail instead. Whole-genome similarity metrics such as Average Nucleotide Identity (ANI) help address this question by facilitating high resolution taxonomic analysis of thousands of genomes from diverse phylogenetic lineages. To scale to available genomes and beyond, we present FastANI, a new method to estimate ANI using alignment-free approximate sequence mapping. FastANI is accurate for both finished and draft genomes, and is up to three orders of magnitude faster compared to alignment-based approaches. We leverage FastANI to compute pairwise ANI values among all prokaryotic genomes available in the NCBI database. Our results reveal clear genetic discontinuity, with 99.8% of the total 8 billion genome pairs analyzed conforming to >95% intra-species and <83% inter-species ANI values. This discontinuity is manifested with or without the most frequently sequenced species, and is robust to historic additions in the genome databases.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Scikit-learn : machine learning in Python

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Shifting the genomic gold standard for the prokaryotic species definition.

              DNA-DNA hybridization (DDH) has been used for nearly 50 years as the gold standard for prokaryotic species circumscriptions at the genomic level. It has been the only taxonomic method that offered a numerical and relatively stable species boundary, and its use has had a paramount influence on how the current classification has been constructed. However, now, in the era of genomics, DDH appears to be an outdated method for classification that needs to be substituted. The average nucleotide identity (ANI) between two genomes seems the most promising method since it mirrors DDH closely. Here we examine the work package JSpecies as a user-friendly, biologist-oriented interface to calculate ANI and the correlation of the tetranucleotide signatures between pairwise genomic comparisons. The results agreed with the use of ANI to substitute DDH, with a narrowed boundary that could be set at approximately 95-96%. In addition, the JSpecies package implemented the tetranucleotide signature correlation index, an alignment-free parameter that generally correlates with ANI and that can be of help in deciding when a given pair of organisms should be classified in the same species. Moreover, for taxonomic purposes, the analyses can be produced by simply randomly sequencing at least 20% of the genome of the query strains rather than obtaining their full sequence.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: Project administrationRole: SupervisionRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: ValidationRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: VisualizationRole: Writing – review & editing
                Journal
                Philos Trans R Soc Lond B Biol Sci
                Philos Trans R Soc Lond B Biol Sci
                RSTB
                royptb
                Philosophical Transactions of the Royal Society B: Biological Sciences
                The Royal Society
                0962-8436
                1471-2970
                October 10, 2022
                August 22, 2022
                August 22, 2022
                : 377
                : 1861 , Discussion meeting issue ‘Genomic population structures of microbial pathogens’ organized and edited by Mark Achtman, David Aanensen and Kathryn Holt
                : 20210240
                Affiliations
                University of Warwick, , Coventry CV4 7AL, UK
                Author notes

                One contribution of 11 to a discussion meeting issue ‘ Genomic population structures of microbial pathogens’.

                [ † ]

                Present address: Pasteurien College, Soochow University, No. 199 Ren'ai Road, Suzhou, Jiangsu 215123, People's Republic of China.

                Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.6097222.

                Author information
                http://orcid.org/0000-0001-6815-0070
                http://orcid.org/0000-0001-9783-0366
                http://orcid.org/0000-0001-9759-9838
                http://orcid.org/0000-0003-3287-7547
                Article
                rstb20210240
                10.1098/rstb.2021.0240
                9393565
                35989609
                abd849bf-75b8-4412-965d-54b069177d49
                © 2022 The Authors.

                Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

                History
                : January 9, 2022
                : March 7, 2022
                Funding
                Funded by: Wellcome Trust, http://dx.doi.org/10.13039/100010269;
                Award ID: 202792/Z/16/Z
                Categories
                1001
                22
                198
                200
                183
                87
                Articles
                Research Articles
                Custom metadata
                October 10, 2022

                Philosophy of science
                big data,hierarchical clustering,genomic databases,enterobase,cgmlst,accessory genome

                Comments

                Comment on this article