24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Markov-Modulated Continuous-Time Markov Chains to Identify Site- and Branch-Specific Evolutionary Variation in BEAST

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Markov models of character substitution on phylogenies form the foundation of phylogenetic inference frameworks. Early models made the simplifying assumption that the substitution process is homogeneous over time and across sites in the molecular sequence alignment. While standard practice adopts extensions that accommodate heterogeneity of substitution rates across sites, heterogeneity in the process over time in a site-specific manner remains frequently overlooked. This is problematic, as evolutionary processes that act at the molecular level are highly variable, subjecting different sites to different selective constraints over time, impacting their substitution behavior. We propose incorporating time variability through Markov-modulated models (MMMs), which extend covarion-like models and allow the substitution process (including relative character exchange rates as well as the overall substitution rate) at individual sites to vary across lineages. We implement a general MMM framework in BEAST, a popular Bayesian phylogenetic inference software package, allowing researchers to compose a wide range of MMMs through flexible XML specification. Using examples from bacterial, viral, and plastid genome evolution, we show that MMMs impact phylogenetic tree estimation and can substantially improve model fit compared to standard substitution models. Through simulations, we show that marginal likelihood estimation accurately identifies the generative model and does not systematically prefer the more parameter-rich MMMs. To mitigate the increased computational demands associated with MMMs, our implementation exploits recent developments in BEAGLE, a high-performance computational library for phylogenetic inference. [Bayesian inference; BEAGLE; BEAST; covarion, heterotachy; Markov-modulated models; phylogenetics.]

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10

          Abstract The Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package has become a primary tool for Bayesian phylogenetic and phylodynamic inference from genetic sequence data. BEAST unifies molecular phylogenetic reconstruction with complex discrete and continuous trait evolution, divergence-time dating, and coalescent demographic models in an efficient statistical inference engine using Markov chain Monte Carlo integration. A convenient, cross-platform, graphical user interface allows the flexible construction of complex evolutionary analyses.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Among-site rate variation and its impact on phylogenetic analyses.

            Although several decades of study have revealed the ubiquity of variation of evolutionary rates among sites, reliable methods for studying rate variation were not developed until very recently. Early methods fit theoretical distributions to the numbers of changes at sites inferred by parsimony and substantially underestimate the rate variation. Recent analyses show that failure to account for rate variation can have drastic effects, leading to biased dating of speciation events, biased estimation of the transition:transversion rate ratio, and incorrect reconstruction of phylogenies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods.

              Q. Z. Yang (1994)
              Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. The first, called the "discrete gamma model," uses several categories of rates to approximate the gamma distribution, with equal probability for each category. The mean of each category is used to represent all the rates falling in the category. The performance of this method is found to be quite good, and four such categories appear to be sufficient to produce both an optimum, or near-optimum fit by the model to the data, and also an acceptable approximation to the continuous distribution. The second method, called "fixed-rates model", classifies sites into several classes according to their rates predicted assuming the star tree. Sites in different classes are then assumed to be evolving at these fixed rates when other tree topologies are evaluated. Analyses of the data sets suggest that this method can produce reasonable results, but it seems to share some properties of a least-squares pairwise comparison; for example, interior branch lengths in nonbest trees are often found to be zero. The computational requirements of the two methods are comparable to that of Felsenstein's (1981, J Mol Evol 17:368-376) model, which assumes a single rate for all the sites.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Syst Biol
                Syst Biol
                sysbio
                Systematic Biology
                Oxford University Press
                1063-5157
                1076-836X
                January 2021
                16 May 2020
                16 May 2020
                : 70
                : 1
                : 181-189
                Affiliations
                [1 ] Department of Microbiology, Immunology and Transplantation, Rega Institute , KU Leuven, Herestraat 49, 3000 Leuven, Belgium
                [2 ] Department of Biostatistics, Jonathan and Karin Fielding School of Public Health , University of California, Los Angeles, CA 90095, USA
                [3 ] Department of Biomathematics, David Geffen School of Medicine at UCLA , University of California, Los Angeles, CA 90095, USA
                [4 ] Department of Human Genetics, David Geffen School of Medicine at UCLA, Universtiy of California , Los Angeles, CA 90095, USA
                Author notes
                Correspondence to be sent to: Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Herestraat 49, 3000 Leuven, Belgium; E-mail: guy.baele@ 123456rega.kuleuven.be .
                Article
                syaa037
                10.1093/sysbio/syaa037
                7744037
                32415977
                2807c098-1306-47c4-ad14-66db6f67cb3d
                © The Author(s) 2020. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 17 May 2019
                : 19 April 2020
                : 06 May 2020
                Page count
                Pages: 9
                Funding
                Funded by: Interne Fondsen KU Leuven/Internal Funds KU Leuven;
                Award ID: C14/18/094
                Funded by: Research Foundation – Flanders [“Fonds voor Wetenschappelijk Onderzoek – Vlaanderen”, G0E1420N to G.B.];
                Funded by: European Union’s Horizon 2020;
                Funded by: Research Foundation – Flanders;
                Funded by: Research Foundation – Flanders;
                Award ID: G066215N
                Award ID: G0D5117N
                Award ID: G0B9317N
                Funded by: National Science Foundation, DOI 10.13039/100000001;
                Award ID: 1264153
                Award ID: R01 AI107034
                Award ID: U19 AI135995
                Funded by: Wellcome Trust, DOI 10.13039/100010269;
                Award ID: 206298/Z/17/Z
                Categories
                Software for Systematics and Evolution
                AcademicSubjects/SCI01130

                Animal science & Zoology
                Animal science & Zoology

                Comments

                Comment on this article