Microbial species delineation using whole genome sequences

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required.

Related collections

Most cited references 22

Record: found
Abstract: found
Article: found

Is Open Access

Database resources of the National Center for Biotechnology Information

Eric Sayers, Tanya Barrett, Dennis A Benson … (2009)

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

0 comments Cited 388 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

GenBank

Dennis A Benson, Ilene Karsch-Mizrachi, David Lipman … (2009)

GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 300 000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank® staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

0 comments Cited 295 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Accurate and universal delineation of prokaryotic species.

Daniel Mende, Shinichi Sunagawa, Georg Zeller … (2013)

The exponentially increasing number of sequenced genomes necessitates fast, accurate, universally applicable and automated approaches for the delineation of prokaryotic species. We developed specI (species identification tool; http://www.bork.embl.de/software/specI/), a method to group organisms into species clusters based on 40 universal, single-copy phylogenetic marker genes. Applied to 3,496 prokaryotic genomes, specI identified 1,753 species clusters. Of 314 discrepancies with a widely used taxonomic classification, >62% were resolved by literature support.

0 comments Cited 181 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (hwp): nar

Journal ID (publisher-id): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): 18 August 2015

Publication date (Electronic): 06 July 2015

Publication date PMC-release: 06 July 2015

Volume: 43

Issue: 14

Pages: 6761-6771

Affiliations

[1 ]Microbial and Metagenome Superprogram, DOE Joint Genomic Institute, Walnut Creek, CA 94598, USA

[2 ]Department of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0355, USA

[3 ]Celgene Corp., San Francisco, CA 94158, USA

Author notes

[* ]To whom correspondence should be addressed. Tel: +1 925 296 5696; Fax: +1 925 296 5666; Email: njvarghese@ 123456lbl.gov

Correspondence may also be addressed to Amrita Pati. Tel: +1 925 927 2580; Fax: +1 925 296 5666; Email: apati@ 123456lbl.gov

Correspondence may also be addressed to Nikos C. Kyrpides. Tel: +925 296 5718; Fax: +1 925 296 5666; Email: nckyrpides@ 123456lbl.gov

Article

DOI: 10.1093/nar/gkv657

PMC ID: 4538840

PubMed ID: 26150420

SO-VID: 8a92d185-3789-45e2-8b4b-92e164ec8a42

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 12 June 2015

Date revision received : 08 June 2015

Date received : 10 December 2014

Page count

Pages: 11

Custom metadata

cover-date 18 August 2015

ScienceOpen disciplines: Genetics

Data availability:

ScienceOpen disciplines: Genetics

Comments

Comment on this article

scite_

Cited by 282

See all cited by

Most referenced authors 1,124

See all reference authors

Microbial species delineation using whole genome sequences

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 22

Database resources of the National Center for Biotechnology Information

GenBank

Accurate and universal delineation of prokaryotic species.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 204

Cited by 282

Most referenced authors 1,124