PANINI : Pangenome Neighbour Identification for Bacterial Populations

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here, we introduce panini (Pangenome Neighbour Identification for Bacterial Populations), a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding based on the t-SNE (t-distributed stochastic neighbour embedding) algorithm. panini is browser-based and integrates with the Microreact platform for rapid online visualization and exploration of both core and accessory genome evolutionary signals, together with relevant epidemiological, geographical, temporal and other metadata. Several case studies with single- and multi-clone pneumococcal populations are presented to demonstrate the ability to identify biologically important signals from gene content data. panini is available at http://panini.pathogen.watch and code at http://gitlab.com/cgps/panini.

Related collections

Most cited references 10

Record: found
Abstract: found
Article: found

Is Open Access

Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations

Jukka Corander, Pekka Marttinen, Jukka Sirén … (2008)

Background During the most recent decade many Bayesian statistical models and software for answering questions related to the genetic structure underlying population samples have appeared in the scientific literature. Most of these methods utilize molecular markers for the inferences, while some are also capable of handling DNA sequence data. In a number of earlier works, we have introduced an array of statistical methods for population genetic inference that are implemented in the software BAPS. However, the complexity of biological problems related to genetic structure analysis keeps increasing such that in many cases the current methods may provide either inappropriate or insufficient solutions. Results We discuss the necessity of enhancing the statistical approaches to face the challenges posed by the ever-increasing amounts of molecular data generated by scientists over a wide range of research areas and introduce an array of new statistical tools implemented in the most recent version of BAPS. With these methods it is possible, e.g., to fit genetic mixture models using user-specified numbers of clusters and to estimate levels of admixture under a genetic linkage model. Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem. In general, we have improved further the computational characteristics of the algorithms behind the methods implemented in BAPS facilitating the analyses of large and complex datasets. In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software. Conclusion The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations. Their implementations in the BAPS software are designed to meet the increasing need for analyzing large-scale population genetics data. The software is freely downloadable for Windows, Linux and Mac OS X systems at .

0 comments Cited 316 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Microreact: visualizing and sharing data for genomic epidemiology and phylogeography

Silvia Argimón, Khalil Abudahab, Richard J. E. Goater … (2016)

Visualization is frequently used to aid our interpretation of complex datasets. Within microbial genomics, visualizing the relationships between multiple genomes as a tree provides a framework onto which associated data (geographical, temporal, phenotypic and epidemiological) are added to generate hypotheses and to explore the dynamics of the system under investigation. Selected static images are then used within publications to highlight the key findings to a wider audience. However, these images are a very inadequate way of exploring and interpreting the richness of the data. There is, therefore, a need for flexible, interactive software that presents the population genomic outputs and associated data in a user-friendly manner for a wide range of end users, from trained bioinformaticians to front-line epidemiologists and health workers. Here, we present Microreact, a web application for the easy visualization of datasets consisting of any combination of trees, geographical, temporal and associated metadata. Data files can be uploaded to Microreact directly via the web browser or by linking to their location (e.g. from Google Drive/Dropbox or via API), and an integrated visualization via trees, maps, timelines and tables provides interactive querying of the data. The visualization can be shared as a permanent web link among collaborators, or embedded within publications to enable readers to explore and download the data. Microreact can act as an end point for any tool or bioinformatic pipeline that ultimately generates a tree, and provides a simple, yet powerful, visualization method that will aid research and discovery and the open sharing of datasets.

0 comments Cited 308 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Population genomics of post-vaccine changes in pneumococcal epidemiology

Nicholas Croucher, Jonathan A. Finkelstein, Stephen I. Pelton … (2013)

Whole genome sequencing of 616 asymptomatically carried pneumococci was used to study the impact of the 7-valent pneumococcal conjugate vaccine. Comparison of closely related isolates revealed the role of transformation in facilitating capsule switching to non-vaccine serotypes and the emergence of drug resistance. However, such recombination was found to occur at significantly different rates across the species, and the evolution of the population was primarily driven by changes in the frequency of distinct genotypes extant pre-vaccine. These alterations resulted in little overall effect on accessory genome composition at the population level, contrasting with the fall in pneumococcal disease rates after the vaccine’s introduction.

0 comments Cited 201 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Microb Genom

Journal ID (iso-abbrev): Microb Genom

Journal ID (hwp): mgen

Journal ID (publisher-id): mgen

Title: Microbial Genomics

Publisher: Microbiology Society

ISSN (Electronic): 2057-5858

Publication date Collection: April 2019

Publication date (Electronic): 22 November 2018

Publication date PMC-release: 22 November 2018

Volume: 5

Issue: 4

Electronic Location Identifier: e000220

Affiliations

[ ¹ ]Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus , Hinxton, UK

[ ² ]School of Veterinary Medicine, University of Surrey , Guildford, UK

[ ³ ]Department of Mathematics and Statistics, Helsinki Institute of Information Technology, University of Helsinki , FI-00014 Helsinki, Finland

[ ⁴ ]Pathogen Genomics, Wellcome Trust Sanger Institute , Hinxton, UK

[ ⁵ ]Department of Infectious Disease Epidemiology, Imperial College London , London, UK

[ ⁶ ]Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo , N-0317 Oslo, Norway

[ ⁷ ]Big Data Institute, Li Ka Shing Centre for Health Informatics, University of Oxford , Oxford, UK

Author notes

*Correspondence: Jukka Corander, jukka.corander@ 123456medisin.uio.no

David M. Aanensen, david.aanensen@ 123456sanger.ac.uk

[†]

These authors contributed equally to this work.

Article

Publisher ID: mgen000220

DOI: 10.1099/mgen.0.000220

PMC ID: 6521588

PubMed ID: 30465642

SO-VID: dcfd8b3e-b472-49cd-91fb-02135cbf597d

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 06 April 2018

Date accepted : 26 August 2018

Funding

Funded by: Wellcome Trust

Award ID: 099202

Funded by: Medical Research Council

Award ID: MR/N019296/1

Funded by: Bill and Melinda Gates Foundation

Award ID: NTD Modelling Consortium

Funded by: Royal Society of Biology

Award ID: 104169/z/14/z

Funded by: European Research Council

Award ID: 742158

Custom metadata

OpenAccessEmbargo 0

Keywords: pangenome,microbial population genomics,machine learning,web application

Data availability:

Keywords: pangenome, microbial population genomics, machine learning, web application

PANINI: Pangenome Neighbour Identification for Bacterial Populations

Read this article at

Abstract

Related collections

Microbial Genomics

Most cited references 10

Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations

Microreact: visualizing and sharing data for genomic epidemiology and phylogeography

Population genomics of post-vaccine changes in pneumococcal epidemiology

Author and article information

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 278

Cited by 11

Most referenced authors 438