24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

      1 , * , 2

      Bioinformatics

      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed ‘bipartitions’. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent.

          Results: We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL’s running time is O ( n 2 k | X | 2 ) , and ASTRAL-II’s running time is O ( n k | X | 2 ) , where n is the number of species, k is the number of loci and X is the set of allowed bipartitions for the search space.

          Availability and implementation: ASTRAL-II is available in open source at https://github.com/smirarab/ASTRAL and datasets used are available at http://www.cs.utexas.edu/~phylo/datasets/astral2/.

          Contact: smirarab@ 123456gmail.com

          Supplementary information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references 16

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          ASTRAL: genome-scale coalescent-based species tree estimation

          Motivation: Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions. Results: We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding accuracy—improving on MP-EST and the population tree from BUCKy, two statistically consistent leading coalescent-based methods. ASTRAL is often more accurate than concatenation using maximum likelihood, except when ILS levels are low or there are too few gene trees. Availability and implementation: ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral. Contact: warnow@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Is a new and general theory of molecular systematics emerging?

            The advent and maturation of algorithms for estimating species trees-phylogenetic trees that allow gene tree heterogeneity and whose tips represent lineages, populations and species, as opposed to genes-represent an exciting confluence of phylogenetics, phylogeography, and population genetics, and ushers in a new generation of concepts and challenges for the molecular systematist. In this essay I argue that to better deal with the large multilocus datasets brought on by phylogenomics, and to better align the fields of phylogeography and phylogenetics, we should embrace the primacy of species trees, not only as a new and useful practical tool for systematics, but also as a long-standing conceptual goal of systematics that, largely due to the lack of appropriate computational tools, has been eclipsed in the past few decades. I suggest that phylogenies as gene trees are a "local optimum" for systematics, and review recent advances that will bring us to the broader optimum inherent in species trees. In addition to adopting new methods of phylogenetic analysis (and ideally reserving the term "phylogeny" for species trees rather than gene trees), the new paradigm suggests shifts in a number of practices, such as sampling data to maximize not only the number of accumulated sites but also the number of independently segregating genes; routinely using coalescent or other models in computer simulations to allow gene tree heterogeneity; and understanding better the role of concatenation in influencing topologies and confidence in phylogenies. By building on the foundation laid by concepts of gene trees and coalescent theory, and by taking cues from recent trends in multilocus phylogeography, molecular systematics stands to be enriched. Many of the challenges and lessons learned for estimating gene trees will carry over to the challenge of estimating species trees, although adopting the species tree paradigm will clarify many issues (such as the nature of polytomies and the star tree paradox), raise conceptually new challenges, or provide new answers to old questions.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Inconsistency of phylogenetic estimates from concatenated data under coalescence.

              Although multiple gene sequences are becoming increasingly available for molecular phylogenetic inference, the analysis of such data has largely relied on inference methods designed for single genes. One of the common approaches to analyzing data from multiple genes is concatenation of the individual gene data to form a single supergene to which traditional phylogenetic inference procedures - e.g., maximum parsimony (MP) or maximum likelihood (ML) - are applied. Recent empirical studies have demonstrated that concatenation of sequences from multiple genes prior to phylogenetic analysis often results in inference of a single, well-supported phylogeny. Theoretical work, however, has shown that the coalescent can produce substantial variation in single-gene histories. Using simulation, we combine these ideas to examine the performance of the concatenation approach under conditions in which the coalescent produces a high level of discord among individual gene trees and show that it leads to statistically inconsistent estimation in this setting. Furthermore, use of the bootstrap to measure support for the inferred phylogeny can result in moderate to strong support for an incorrect tree under these conditions. These results highlight the importance of incorporating variation in gene histories into multilocus phylogenetics.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 June 2015
                10 June 2015
                10 June 2015
                : 31
                : 12
                : i44-i52
                Affiliations
                1Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA and 2Departments of Computer Science and Bioengineering, The University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA
                Author notes
                *To whom correspondence should be addressed.
                Article
                btv234
                10.1093/bioinformatics/btv234
                4765870
                26072508
                © The Author 2015. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                Page count
                Pages: 9
                Categories
                Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
                Genes

                Bioinformatics & Computational biology

                Comments

                Comment on this article