Hierarchical and Spatially Explicit Clustering of DNA Sequences with BAPS Software

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Phylogeographical analyses have become commonplace for a myriad of organisms with the advent of cheap DNA sequencing technologies. Bayesian model-based clustering is a powerful tool for detecting important patterns in such data and can be used to decipher even quite subtle signals of systematic differences in molecular variation. Here, we introduce two upgrades to the Bayesian Analysis of Population Structure (BAPS) software, which enable 1) spatially explicit modeling of variation in DNA sequences and 2) hierarchical clustering of DNA sequence data to reveal nested genetic population structures. We provide a direct interface to map the results from spatial clustering with Google Maps using the portal http://www.spatialepidemiology.net/ and illustrate this approach using sequence data from Borrelia burgdorferi. The usefulness of hierarchical clustering is demonstrated through an analysis of the metapopulation structure within a bacterial population experiencing a high level of local horizontal gene transfer. The tools that are introduced are freely available at http://www.helsinki.fi/bsg/software/BAPS/.

Related collections

Most cited references 18

Record: found
Abstract: found
Article: found

Is Open Access

Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations

Jukka Corander, Pekka Marttinen, Jukka Sirén … (2008)

Background During the most recent decade many Bayesian statistical models and software for answering questions related to the genetic structure underlying population samples have appeared in the scientific literature. Most of these methods utilize molecular markers for the inferences, while some are also capable of handling DNA sequence data. In a number of earlier works, we have introduced an array of statistical methods for population genetic inference that are implemented in the software BAPS. However, the complexity of biological problems related to genetic structure analysis keeps increasing such that in many cases the current methods may provide either inappropriate or insufficient solutions. Results We discuss the necessity of enhancing the statistical approaches to face the challenges posed by the ever-increasing amounts of molecular data generated by scientists over a wide range of research areas and introduce an array of new statistical tools implemented in the most recent version of BAPS. With these methods it is possible, e.g., to fit genetic mixture models using user-specified numbers of clusters and to estimate levels of admixture under a genetic linkage model. Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem. In general, we have improved further the computational characteristics of the algorithms behind the methods implemented in BAPS facilitating the analyses of large and complex datasets. In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software. Conclusion The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations. Their implementations in the BAPS software are designed to meet the increasing need for analyzing large-scale population genetics data. The software is freely downloadable for Windows, Linux and Mac OS X systems at .

0 comments Cited 316 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Bayesian identification of admixture events using multilocus molecular markers.

Jukka Corander, Pekka Marttinen (2006)

Bayesian statistical methods for the estimation of hidden genetic structure of populations have gained considerable popularity in the recent years. Utilizing molecular marker data, Bayesian mixture models attempt to identify a hidden population structure by clustering individuals into genetically divergent groups, whereas admixture models target at separating the ancestral sources of the alleles observed in different individuals. We discuss the difficulties involved in the simultaneous estimation of the number of ancestral populations and the levels of admixture in studied individuals' genomes. To resolve this issue, we introduce a computationally efficient method for the identification of admixture events in the population history. Our approach is illustrated by analyses of several challenging real and simulated data sets. The software (baps), implementing the methods introduced here, is freely available at http://www.rni.helsinki.fi/~jic/bapspage.html.

0 comments Cited 221 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A spatial statistical model for landscape genetics.

Gilles Guillot, Arnaud Estoup, Frédéric Mortier … (2005)

Landscape genetics is a new discipline that aims to provide information on how landscape and environmental features influence population genetic structure. The first key step of landscape genetics is the spatial detection and location of genetic discontinuities between populations. However, efficient methods for achieving this task are lacking. In this article, we first clarify what is conceptually involved in the spatial modeling of genetic data. Then we describe a Bayesian model implemented in a Markov chain Monte Carlo scheme that allows inference of the location of such genetic discontinuities from individual geo-referenced multilocus genotypes, without a priori knowledge on populational units and limits. In this method, the global set of sampled individuals is modeled as a spatial mixture of panmictic populations, and the spatial organization of populations is modeled through the colored Voronoi tessellation. In addition to spatially locating genetic discontinuities, the method quantifies the amount of spatial dependence in the data set, estimates the number of populations in the studied area, assigns individuals to their population of origin, and detects individual migrants between populations, while taking into account uncertainty on the location of sampled individuals. The performance of the method is evaluated through the analysis of simulated data sets. Results show good performances for standard data sets (e.g., 100 individuals genotyped at 10 loci with 10 alleles per locus), with high but also low levels of population differentiation (e.g., FST<0.05). The method is then applied to a set of 88 individuals of wolverines (Gulo gulo) sampled in the northwestern United States and genotyped at 10 microsatellites.

0 comments Cited 217 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Mol Biol Evol

Journal ID (iso-abbrev): Mol. Biol. Evol

Journal ID (publisher-id): molbev

Journal ID (hwp): molbiolevol

Title: Molecular Biology and Evolution

Publisher: Oxford University Press

ISSN (Print): 0737-4038

ISSN (Electronic): 1537-1719

Publication date (Print): May 2013

Publication date (Electronic): 13 February 2013

Publication date PMC-release: 13 February 2013

Volume: 30

Issue: 5

Pages: 1224-1228

Affiliations

¹Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland

²Cardiff School of Biosciences, Cardiff University, Cardiff, United Kingdom

³Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom

⁴Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom

Author notes

*Corresponding author: E-mail: jukka.corander@ 123456helsinki.fi .

Associate editor: Sudhir Kumar

Article

Publisher ID: mst028

DOI: 10.1093/molbev/mst028

PMC ID: 3670731

PubMed ID: 23408797

SO-VID: fc732dba-e9fa-438d-b899-f8f573958359

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Hierarchical and Spatially Explicit Clustering of DNA Sequences with BAPS Software

Read this article at

Abstract

Related collections

Ticks and tick-borne pathogens

Most cited references 18

Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations

Bayesian identification of admixture events using multilocus molecular markers.

A spatial statistical model for landscape genetics.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 77

Cited by 268

Most referenced authors 281