Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses.

Results

We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias.

Conclusions

Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning.

Related collections

Most cited references 59

Record: found
Abstract: not found
Article: not found

Comparison of phylogenetic trees

D.F. Robinson, L.R. Foulds (1981)

0 comments Cited 644 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Bayesian phylogenetic analysis of combined data.

Johan Nylander, Fredrik Ronquist, John P. Huelsenbeck … (2004)

The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameter-rich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we developed a Bayesian MCMC approach to the analysis of combined data sets and explored its utility in inferring relationships among gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and protein coding). Examined models range in complexity from those recognizing only a morphological and a molecular partition to those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis deals efficiently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequate for all parameters even under very complex models, and the parameter update cycle is virtually unaffected by model partitioning across sites. Morphology contributed only 5% of the characters in the data set but nevertheless influenced the combined-data tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria (Bayes factors) to show that process heterogeneity across data partitions is a significant model component, although not as important as among-site rate variation. More complex evolutionary models are associated with more topological uncertainty and less conflict between morphology and molecules. Bayes factors sometimes favor simpler models over considerably more parameter-rich models, but the best model overall is also the most complex and Bayes factors do not support exclusion of apparently weak parameters from this model. Thus, Bayes factors appear to be useful for selecting among complex models, but it is still unclear whether their use strikes a reasonable balance between model complexity and error in parameter estimates.

0 comments Cited 510 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales.

Brant C Faircloth, John E McCormack, Nicholas G Crawford … (2012)

Although massively parallel sequencing has facilitated large-scale DNA sequencing, comparisons among distantly related species rely upon small portions of the genome that are easily aligned. Methods are needed to efficiently obtain comparable DNA fragments prior to massively parallel sequencing, particularly for biologists working with non-model organisms. We introduce a new class of molecular marker, anchored by ultraconserved genomic elements (UCEs), that universally enable target enrichment and sequencing of thousands of orthologous loci across species separated by hundreds of millions of years of evolution. Our analyses here focus on use of UCE markers in Amniota because UCEs and phylogenetic relationships are well-known in some amniotes. We perform an in silico experiment to demonstrate that sequence flanking 2030 UCEs contains information sufficient to enable unambiguous recovery of the established primate phylogeny. We extend this experiment by performing an in vitro enrichment of 2386 UCE-anchored loci from nine, non-model avian species. We then use alignments of 854 of these loci to unambiguously recover the established evolutionary relationships within and among three ancient bird lineages. Because many organismal lineages have UCEs, this type of genetic marker and the analytical framework we outline can be applied across the tree of life, potentially reshaping our understanding of phylogeny at many taxonomic levels.

0 comments Cited 417 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Paul B Frandsen: paulbfrandsen@gmail.com

Brett Calcott: brett.calcott@gmail.com

Christoph Mayer: c.mayer.zfmk@uni-bonn.de

Robert Lanfear: robert.lanfear@mq.edu.au

Journal

Journal ID (nlm-ta): BMC Evol Biol

Journal ID (iso-abbrev): BMC Evol. Biol

Title: BMC Evolutionary Biology

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2148

Publication date (Electronic): 10 February 2015

Publication date PMC-release: 10 February 2015

Publication date Collection: 2015

Volume: 15

Issue: 1

Electronic Location Identifier: 13

Affiliations

[ ]Office of Research Information Services, Office of the CIO, Smithsonian Institution, Washington, D.C. USA

[ ]Department of Entomology, Rutgers University, New Brunswick, New Jersey USA

[ ]School of Life Sciences, Arizona State University, Tempe, AZ USA

[ ]Zoologisches Forschungsmuseum Alexander Koenig (ZFMK)/Zentrum für Molekulare Biodiversitätsforschung (ZMB), Bonn, Germany

[ ]Ecology Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT Australia

[ ]National Evolutionary Synthesis Center, Durham, NC USA

[ ]Department of Biological Sciences, Macquarie University, Sydney, Australia

Article

Publisher ID: 283

DOI: 10.1186/s12862-015-0283-7

PMC ID: 4327964

PubMed ID: 25887041

SO-VID: 85c52bad-0321-4a44-bd35-b31b4cda1e32

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 11 August 2014

Date accepted : 13 January 2015

Custom metadata

ScienceOpen disciplines: Evolutionary Biology

Keywords: model selection,partitioning,partitionfinder,phylogenetics,phylogenomics,k-means,clustering,ultra-conserved elements,uce’s

Data availability:

ScienceOpen disciplines: Evolutionary Biology

Keywords: model selection, partitioning, partitionfinder, phylogenetics, phylogenomics, k-means, clustering, ultra-conserved elements, uce’s

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Teaching and learning evolution

Most cited references 59

Comparison of phylogenetic trees

Bayesian phylogenetic analysis of combined data.

Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales.

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 45

Cited by 37

Most referenced authors 1,255