Insights from 20 years of bacterial genome sequencing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.

Related collections

Most cited references 136

Record: found
Abstract: found
Article: not found

ISfinder: the reference centre for bacterial insertion sequences

P. Siguier, J. Perochon, L. Lestrade … (2005)

ISfinder () is a dedicated database for bacterial insertion sequences (ISs). It has superseded the Stanford reference center. One of its functions is to assign IS names and to provide a focal point for a coherent nomenclature. It is also the repository for ISs. Each new IS is indexed together with information such as its DNA sequence and open reading frames or potential coding sequences, the sequence of the ends of the element and target sites, its origin and distribution together with a bibliography where available. Another objective is to continuously monitor ISs to provide updated comprehensive groupings or families and to provide some insight into their phylogenies. The site also contains extensive background information on ISs and transposons in general. Online tools are gradually being added. At present an online Blast facility against the entire bank is available. But additional features will include alignment capability, PsiBLAST and HMM profiles. ISfinder also includes a section on bacterial genomes and is involved in annotating the IS content of these genomes. Finally, this database is currently recommended by several microbiology journals for registration of new IS elements before their publication.

0 comments Cited 1122 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

H Tettelin, V Masignani, M. J. Cieslewicz … (2005)

The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.

0 comments Cited 911 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Community structure and metabolism through reconstruction of microbial genomes from the environment.

Gene W. Tyson, Jarrod Chapman, Philip Hugenholtz … (2004)

Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment.

0 comments Cited 650 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

David W. Ussery: (865) 574-8201 , usserydw@ornl.gov

Journal

Journal ID (nlm-ta): Funct Integr Genomics

Journal ID (iso-abbrev): Funct. Integr. Genomics

Title: Functional & Integrative Genomics

Publisher: Springer Berlin Heidelberg (Berlin/Heidelberg )

ISSN (Print): 1438-793X

ISSN (Electronic): 1438-7948

Publication date (Electronic): 27 February 2015

Publication date PMC-release: 27 February 2015

Publication date (Print): 2015

Volume: 15

Issue: 2

Pages: 141-161

Affiliations

[ ]Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA

[ ]Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA

[ ]Department of Microbiology, University of Tennessee, Knoxville, TN 37996 USA

[ ]Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA

[ ]Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark

[ ]Molecular Microbiology and Genomics Consultants, Tannenstr 7, 55576 Zotzenheim, Germany

[ ]Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA

Article

Publisher ID: 433

DOI: 10.1007/s10142-015-0433-4

PMC ID: 4361730

PubMed ID: 25722247

SO-VID: cc1a97c0-9f9e-452d-a581-f972f8f7e9e6

License:

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

History

Date received : 19 January 2015

Date revision received : 11 February 2015

Date accepted : 12 February 2015

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: bacteria,comparative genomics,bacterial genomes,metagenomics,core-genome,pan-genome,next-generation sequencing

Data availability:

ScienceOpen disciplines: Genetics

Keywords: bacteria, comparative genomics, bacterial genomes, metagenomics, core-genome, pan-genome, next-generation sequencing

Insights from 20 years of bacterial genome sequencing

Read this article at

Abstract

Related collections

Arabidopsis genomics

Most cited references 136

ISfinder: the reference centre for bacterial insertion sequences

Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

Community structure and metabolism through reconstruction of microbial genomes from the environment.

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 109

Cited by 246