17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ASTRID: Accurate Species TRees from Internode Distances

      research-article
      1 , 1 ,
      BMC Genomics
      BioMed Central
      13th Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics
      4-7 October 2015
      incomplete lineage sorting, phylogenomics, species trees, ASTRAL, NJst, MP-EST, FastME, PhyD*, neighbor joining

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Incomplete lineage sorting (ILS), modelled by the multi-species coalescent (MSC), is known to create discordance between gene trees and species trees, and lead to inaccurate species tree estimations unless appropriate methods are used to estimate the species tree. While many statistically consistent methods have been developed to estimate the species tree in the presence of ILS, only ASTRAL-2 and NJst have been shown to have good accuracy on large datasets. Yet, NJst is generally slower and less accurate than ASTRAL-2, and cannot run on some datasets.

          Results

          We have redesigned NJst to enable it to run on all datasets, and we have expanded its design space so that it can be used with different distance-based tree estimation methods. The resultant method, ASTRID, is statistically consistent under the MSC model, and has accuracy that is competitive with ASTRAL-2. Furthermore, ASTRID is much faster than ASTRAL-2, completing in minutes on some datasets for which ASTRAL-2 used hours.

          Conclusions

          ASTRID is a new coalescent-based method for species tree estimation that is competitive with the best current method in terms of accuracy, while being much faster. ASTRID is available in open source form on github.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A maximum pseudo-likelihood approach for estimating species trees under the coalescent model

          Background Several phylogenetic approaches have been developed to estimate species trees from collections of gene trees. However, maximum likelihood approaches for estimating species trees under the coalescent model are limited. Although the likelihood of a species tree under the multispecies coalescent model has already been derived by Rannala and Yang, it can be shown that the maximum likelihood estimate (MLE) of the species tree (topology, branch lengths, and population sizes) from gene trees under this formula does not exist. In this paper, we develop a pseudo-likelihood function of the species tree to obtain maximum pseudo-likelihood estimates (MPE) of species trees, with branch lengths of the species tree in coalescent units. Results We show that the MPE of the species tree is statistically consistent as the number M of genes goes to infinity. In addition, the probability that the MPE of the species tree matches the true species tree converges to 1 at rate O(M -1). The simulation results confirm that the maximum pseudo-likelihood approach is statistically consistent even when the species tree is in the anomaly zone. We applied our method, Maximum Pseudo-likelihood for Estimating Species Trees (MP-EST) to a mammal dataset. The four major clades found in the MP-EST tree are consistent with those in the Bayesian concatenation tree. The bootstrap supports for the species tree estimated by the MP-EST method are more reasonable than the posterior probability supports given by the Bayesian concatenation method in reflecting the level of uncertainty in gene trees and controversies over the relationship of four major groups of placental mammals. Conclusions MP-EST can consistently estimate the topology and branch lengths (in coalescent units) of the species tree. Although the pseudo-likelihood is derived from coalescent theory, and assumes no gene flow or horizontal gene transfer (HGT), the MP-EST method is robust to a small amount of HGT in the dataset. In addition, increasing the number of genes does not increase the computational time substantially. The MP-EST method is fast for analyzing datasets that involve a large number of genes but a moderate number of species.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model.

            The reconstruction of the Tree of Life has relied almost entirely on concatenation methods, which do not accommodate gene tree heterogeneity, a property that simulations and theory have identified as a likely cause of incongruent phylogenies. However, this incongruence has not yet been demonstrated in empirical studies. Several key relationships among eutherian mammals remain controversial and conflicting among previous studies, including the root of eutherian tree and the relationships within Euarchontoglires and Laurasiatheria. Both bayesian and maximum-likelihood analysis of genome-wide data of 447 nuclear genes from 37 species show that concatenation methods indeed yield strong incongruence in the phylogeny of eutherian mammals, as revealed by subsampling analyses of loci and taxa, which produced strongly conflicting topologies. In contrast, the coalescent methods, which accommodate gene tree heterogeneity, yield a phylogeny that is robust to variable gene and taxon sampling and is congruent with geographic data. The data also demonstrate that incomplete lineage sorting, a major source of gene tree heterogeneity, is relevant to deep-level phylogenies, such as those among eutherian mammals. Our results firmly place the eutherian root between Atlantogenata and Boreoeutheria and support ungulate polyphyly and a sister-group relationship between Scandentia and Primates. This study demonstrates that the incongruence introduced by concatenation methods is a major cause of long-standing uncertainty in the phylogeny of eutherian mammals, and the same may apply to other clades. Our analyses suggest that such incongruence can be resolved using phylogenomic data and coalescent methods that deal explicitly with gene tree heterogeneity.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis.

              BUCKy is a C++ program that implements Bayesian concordance analysis. The method uses a non-parametric clustering of genes with compatible trees, and reconstructs the primary concordance tree from clades supported by the largest proportions of genes. A population tree with branch lengths in coalescent units is estimated from quartet concordance factors. BUCKy is open source and distributed under the GNU general public license at www.stat.wisc.edu/∼ane/bucky/.
                Bookmark

                Author and article information

                Contributors
                Conference
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central
                1471-2164
                2015
                2 October 2015
                : 16
                : Suppl 10
                : S3
                Affiliations
                [1 ]Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Avenue, Urbana, IL, 61801 USA
                Article
                1471-2164-16-S10-S3
                10.1186/1471-2164-16-S10-S3
                4602181
                26449326
                2bb534c2-bf2f-4ac1-8d88-3f9f6660d4d5
                Copyright © 2015 Vachaspati and Warnow

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                13th Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics
                Frankfurt, Germany
                4-7 October 2015
                History
                Categories
                Research

                Genetics
                incomplete lineage sorting,phylogenomics,species trees,astral,njst,mp-est,fastme,phyd*,neighbor joining
                Genetics
                incomplete lineage sorting, phylogenomics, species trees, astral, njst, mp-est, fastme, phyd*, neighbor joining

                Comments

                Comment on this article