Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera. Although the PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes. Here we describe developments in the BIGSdb software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications. The system is based on the gene-by-gene analysis of microbial genomes, with each deposited sequence annotated and curated to identify the genes present and systematically catalogue their variation. Originally intended as a means of characterising isolates with typing schemes, the synthesis of sequences and records of genetic variation with provenance and phenotype data permits highly scalable (whole genome sequence data for tens of thousands of isolates) means of addressing a wide range of functional questions, including: the prediction of antimicrobial resistance; likely cross-reactivity with vaccine antigens; and the functional activities of different variants that lead to key phenotypes. There are no limitations to the number of sequences, genetic loci, allelic variants or schemes (combinations of loci) that can be included, enabling each database to represent an expanding catalogue of the genetic variation of the population in question. In addition to providing web-accessible analyses and links to third-party analysis and visualisation tools, the BIGSdb software includes a RESTful application programming interface (API) that enables access to all the underlying data for third-party applications and data analysis pipelines.

Related collections

Most cited references 80

Record: found
Abstract: found
Article: not found

Neighbor-net: an agglomerative method for the construction of phylogenetic networks.

David Bryant, Vincent Moulton (2004)

We present Neighbor-Net, a distance based method for constructing phylogenetic networks that is based on the Neighbor-Joining (NJ) algorithm of Saitou and Nei. Neighbor-Net provides a snapshot of the data that can guide more detailed analysis. Unlike split decomposition, Neighbor-Net scales well and can quickly produce detailed and informative networks for several hundred taxa. We illustrate the method by reanalyzing three published data sets: a collection of 110 highly recombinant Salmonella multi-locus sequence typing sequences, the 135 "African Eve" human mitochondrial sequences published by Vigilant et al., and a collection of 12 Archeal chaperonin sequences demonstrating strong evidence for gene conversion. Neighbor-Net is available as part of the SplitsTree4 software package.

0 comments Cited 605 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The microbial pan-genome.

Duccio Medini, Claudio Donati, Hervé Tettelin … (2005)

A decade after the beginning of the genomic era, the question of how genomics can describe a bacterial species has not been fully addressed. Experimental data have shown that in some species new genes are discovered even after sequencing the genomes of several strains. Mathematical modeling predicts that new genes will be discovered even after sequencing hundreds of genomes per species. Therefore, a bacterial species can be described by its pan-genome, which is composed of a "core genome" containing genes present in all strains, and a "dispensable genome" containing genes present in two or more strains and genes unique to single strains. Given that the number of unique genes is vast, the pan-genome of a bacterial species might be orders of magnitude larger than any single genome.

0 comments Cited 507 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

BIGSdb: Scalable analysis of bacterial genome variation at the population level

Keith Jolley, Martin Maiden (2010)

Background The opportunities for bacterial population genomics that are being realised by the application of parallel nucleotide sequencing require novel bioinformatics platforms. These must be capable of the storage, retrieval, and analysis of linked phenotypic and genotypic information in an accessible, scalable and computationally efficient manner. Results The Bacterial Isolate Genome Sequence Database (BIGSDB) is a scalable, open source, web-accessible database system that meets these needs, enabling phenotype and sequence data, which can range from a single sequence read to whole genome data, to be efficiently linked for a limitless number of bacterial specimens. The system builds on the widely used mlstdbNet software, developed for the storage and distribution of multilocus sequence typing (MLST) data, and incorporates the capacity to define and identify any number of loci and genetic variants at those loci within the stored nucleotide sequences. These loci can be further organised into 'schemes' for isolate characterisation or for evolutionary or functional analyses. Isolates and loci can be indexed by multiple names and any number of alternative schemes can be accommodated, enabling cross-referencing of different studies and approaches. LIMS functionality of the software enables linkage to and organisation of laboratory samples. The data are easily linked to external databases and fine-grained authentication of access permits multiple users to participate in community annotation by setting up or contributing to different schemes within the database. Some of the applications of BIGSDB are illustrated with the genera Neisseria and Streptococcus. The BIGSDB source code and documentation are available at http://pubmlst.org/software/database/bigsdb/. Conclusions Genomic data can be used to characterise bacterial isolates in many different ways but it can also be efficiently exploited for evolutionary or functional studies. BIGSDB represents a freely available resource that will assist the broader community in the elucidation of the structure and function of bacteria by means of a population genomics approach.

0 comments Cited 415 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Keith A. Jolley: Role: ConceptualizationRole: Data CurationRole: MethodologyRole: Project AdministrationRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – Original Draft PreparationRole: Writing – Review & Editing

ORCID: https://orcid.org/0000-0002-0751-0287

James E. Bray: Role: Data CurationRole: MethodologyRole: SoftwareRole: ValidationRole: Writing – Review & Editing

ORCID: https://orcid.org/0000-0003-3554-4254

Martin C. J. Maiden: Role: ConceptualizationRole: Funding AcquisitionRole: MethodologyRole: Project AdministrationRole: SupervisionRole: Writing – Original Draft PreparationRole: Writing – Review & Editing

ORCID: https://orcid.org/0000-0001-6321-5138

Journal

Journal ID (nlm-ta): Wellcome Open Res

Journal ID (iso-abbrev): Wellcome Open Res

Journal ID (pmc): Wellcome Open Res

Title: Wellcome Open Research

Publisher: F1000 Research Limited (London, UK )

ISSN (Electronic): 2398-502X

Publication date (Electronic): 24 September 2018

Publication date Collection: 2018

Volume: 3

Electronic Location Identifier: 124

Affiliations

[1 ]Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK

[1 ]Data Analytics, BioMérieux/Applied Maths, Sint-Martens-Latem, East Flanders, Belgium

[1 ]Biodiversity and Epidemiology of Bacterial Pathogens, Institut Pasteur, Paris, France

Author notes

[a ] keith.jolley@ 123456zoo.ox.ac.uk

No competing interests were disclosed.

Competing interests: No competing interests were disclosed.

Competing interests: My group is using the BIGSdb platform to power the Pasteur MLST web site and databases; as such, I am benefiting from the collaboration of K. Jolley in supporting the deployment of the web application at Pasteur.

Author information

Keith A. Jolley https://orcid.org/0000-0002-0751-0287

James E. Bray https://orcid.org/0000-0003-3554-4254

Martin C. J. Maiden https://orcid.org/0000-0001-6321-5138

Article

DOI: 10.12688/wellcomeopenres.14826.1

PMC ID: 6192448

PubMed ID: 30345391

SO-VID: 83399d47-ae7d-4c9c-af8c-3210e83c1391

License:

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 18 September 2018

Funding

Funded by: Wellcome Trust

Award ID: 104992

Funded by: FP7 Ideas: European Research Council

Award ID: FP7-278864-2

Development of PubMLST and BIGSdb has been supported by a Wellcome Trust Biomedical Resource Grant (104992). Design and implementation of the RESTful API has been further supported by the European Community grant FP7-278864-2 (PathoNgenTrace, http://www.patho-ngen-trace.eu/).

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications

Read this article at

Abstract

Related collections

Microbial Genomics

Most cited references 80

Neighbor-net: an agglomerative method for the construction of phylogenetic networks.

The microbial pan-genome.

BIGSdb: Scalable analysis of bacterial genome variation at the population level

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 65

Cited by 904

Most referenced authors 1,494