55
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses.

          Results

          We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias.

          Conclusions

          Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning.

          Related collections

          Most cited references59

          • Record: found
          • Abstract: not found
          • Article: not found

          Comparison of phylogenetic trees

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Bayesian phylogenetic analysis of combined data.

            The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameter-rich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we developed a Bayesian MCMC approach to the analysis of combined data sets and explored its utility in inferring relationships among gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and protein coding). Examined models range in complexity from those recognizing only a morphological and a molecular partition to those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis deals efficiently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequate for all parameters even under very complex models, and the parameter update cycle is virtually unaffected by model partitioning across sites. Morphology contributed only 5% of the characters in the data set but nevertheless influenced the combined-data tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria (Bayes factors) to show that process heterogeneity across data partitions is a significant model component, although not as important as among-site rate variation. More complex evolutionary models are associated with more topological uncertainty and less conflict between morphology and molecules. Bayes factors sometimes favor simpler models over considerably more parameter-rich models, but the best model overall is also the most complex and Bayes factors do not support exclusion of apparently weak parameters from this model. Thus, Bayes factors appear to be useful for selecting among complex models, but it is still unclear whether their use strikes a reasonable balance between model complexity and error in parameter estimates.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales.

              Although massively parallel sequencing has facilitated large-scale DNA sequencing, comparisons among distantly related species rely upon small portions of the genome that are easily aligned. Methods are needed to efficiently obtain comparable DNA fragments prior to massively parallel sequencing, particularly for biologists working with non-model organisms. We introduce a new class of molecular marker, anchored by ultraconserved genomic elements (UCEs), that universally enable target enrichment and sequencing of thousands of orthologous loci across species separated by hundreds of millions of years of evolution. Our analyses here focus on use of UCE markers in Amniota because UCEs and phylogenetic relationships are well-known in some amniotes. We perform an in silico experiment to demonstrate that sequence flanking 2030 UCEs contains information sufficient to enable unambiguous recovery of the established primate phylogeny. We extend this experiment by performing an in vitro enrichment of 2386 UCE-anchored loci from nine, non-model avian species. We then use alignments of 854 of these loci to unambiguously recover the established evolutionary relationships within and among three ancient bird lineages. Because many organismal lineages have UCEs, this type of genetic marker and the analytical framework we outline can be applied across the tree of life, potentially reshaping our understanding of phylogeny at many taxonomic levels.
                Bookmark

                Author and article information

                Contributors
                paulbfrandsen@gmail.com
                brett.calcott@gmail.com
                c.mayer.zfmk@uni-bonn.de
                robert.lanfear@mq.edu.au
                Journal
                BMC Evol Biol
                BMC Evol. Biol
                BMC Evolutionary Biology
                BioMed Central (London )
                1471-2148
                10 February 2015
                10 February 2015
                2015
                : 15
                : 1
                : 13
                Affiliations
                [ ]Office of Research Information Services, Office of the CIO, Smithsonian Institution, Washington, D.C. USA
                [ ]Department of Entomology, Rutgers University, New Brunswick, New Jersey USA
                [ ]School of Life Sciences, Arizona State University, Tempe, AZ USA
                [ ]Zoologisches Forschungsmuseum Alexander Koenig (ZFMK)/Zentrum für Molekulare Biodiversitätsforschung (ZMB), Bonn, Germany
                [ ]Ecology Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT Australia
                [ ]National Evolutionary Synthesis Center, Durham, NC USA
                [ ]Department of Biological Sciences, Macquarie University, Sydney, Australia
                Article
                283
                10.1186/s12862-015-0283-7
                4327964
                25887041
                85c52bad-0321-4a44-bd35-b31b4cda1e32
                © Frandsen et al.; licensee BioMed Central. 2015

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 11 August 2014
                : 13 January 2015
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2015

                Evolutionary Biology
                model selection,partitioning,partitionfinder,phylogenetics,phylogenomics,k-means,clustering,ultra-conserved elements,uce’s

                Comments

                Comment on this article