30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction

      research-article
      1 , 2 , 2 , *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Phylogenetic trees inferred using commonly-used models of sequence evolution are unrooted, but the root position matters both for interpretation and downstream applications. This issue has been long recognized; however, whether the potential for discordance between the species tree and gene trees impacts methods of rooting a phylogenetic tree has not been extensively studied. In this paper, we introduce a new method of rooting a tree based on its branch length distribution; our method, which minimizes the variance of root to tip distances, is inspired by the traditional midpoint rerooting and is justified when deviations from the strict molecular clock are random. Like midpoint rerooting, the method can be implemented in a linear time algorithm. In extensive simulations that consider discordance between gene trees and the species tree, we show that the new method is more accurate than midpoint rerooting, but its relative accuracy compared to using outgroups to root gene trees depends on the size of the dataset and levels of deviations from the strict clock. We show high levels of error for all methods of rooting estimated gene trees due to factors that include effects of gene tree discordance, deviations from the clock, and gene tree estimation error. Our simulations, however, did not reveal significant differences between two equivalent methods for species tree estimation that use rooted and unrooted input, namely, STAR and NJst. Nevertheless, our results point to limitations of existing scalable rooting methods.

          Related collections

          Most cited references53

          • Record: found
          • Abstract: not found
          • Article: not found

          Gene Trees in Species Trees

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            ASTRAL: genome-scale coalescent-based species tree estimation

            Motivation: Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions. Results: We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding accuracy—improving on MP-EST and the population tree from BUCKy, two statistically consistent leading coalescent-based methods. ASTRAL is often more accurate than concatenation using maximum likelihood, except when ILS levels are low or there are too few gene trees. Availability and implementation: ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral. Contact: warnow@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

              Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed ‘bipartitions’. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent. Results: We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL’s running time is O ( n 2 k | X | 2 ) , and ASTRAL-II’s running time is O ( n k | X | 2 ) , where n is the number of species, k is the number of loci and X is the set of allowed bipartitions for the search space. Availability and implementation: ASTRAL-II is available in open source at https://github.com/smirarab/ASTRAL and datasets used are available at http://www.cs.utexas.edu/~phylo/datasets/astral2/. Contact: smirarab@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: VisualizationRole: Writing – original draft
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: MethodologyRole: Project administrationRole: ResourcesRole: SupervisionRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2017
                11 August 2017
                : 12
                : 8
                : e0182238
                Affiliations
                [1 ] Dept of Computer Science and Engineering, University of California at San Diego, San Diego, CA, United States of America
                [2 ] Dept of Electrical and Computer Engineering, University of California at San Diego, San Diego, CA, United States of America
                Wilfrid Laurier University, CANADA
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0001-5065-2814
                Article
                PONE-D-17-14644
                10.1371/journal.pone.0182238
                5553649
                28800608
                d8864e26-9201-479e-b9f6-8ea3802befff
                © 2017 Mai et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 14 April 2017
                : 25 June 2017
                Page count
                Figures: 4, Tables: 1, Pages: 19
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: IIS-1565862
                Award Recipient :
                This work was supported by the National Science Foundation grant IIS-1565862 ( https://www.nsf.gov/awardsearch/showAward?AWD_ID=1565862) to SM, UM, and ES. Computations were performed on the San Diego Supercomputer Center (SDSC) through XSEDE allocations, which is supported by the National Science Foundation grant ACI-1053575 ( https://www.nsf.gov/awardsearch/showAward?AWD_ID=1053575).
                Categories
                Research Article
                Biology and Life Sciences
                Evolutionary Biology
                Evolutionary Systematics
                Phylogenetics
                Phylogenetic Analysis
                Biology and Life Sciences
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Phylogenetic Analysis
                Computer and Information Sciences
                Data Management
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Phylogenetic Analysis
                Biology and Life Sciences
                Plant Science
                Plant Anatomy
                Leaves
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Biology and Life Sciences
                Evolutionary Biology
                Evolutionary Systematics
                Phylogenetics
                Biology and Life Sciences
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Computer and Information Sciences
                Data Management
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Biology and Life Sciences
                Evolutionary Biology
                Evolutionary Genetics
                Research and Analysis Methods
                Simulation and Modeling
                Engineering and Technology
                Measurement
                Distance Measurement
                Physical Sciences
                Mathematics
                Probability Theory
                Random Variables
                Custom metadata
                The code, datasets, and scripts used are all available at: https://uym2.github.io/MinVar-Rooting/.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article