31
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      BBCA: Improving the scalability of *BEAST using random binning

      research-article
      1 , 2 , 1 , 1 , 3 ,
      BMC Genomics
      BioMed Central
      Twelfth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics
      19-22 October 2014
      multi-species coalescent, phylogenomics, incomplete lineage sorting, binning

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses.

          Results

          We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci.

          Conclusions

          Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: not found
          • Article: not found

          Comparison of phylogenetic trees

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Gene Trees in Species Trees

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              ASTRAL: genome-scale coalescent-based species tree estimation

              Motivation: Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions. Results: We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding accuracy—improving on MP-EST and the population tree from BUCKy, two statistically consistent leading coalescent-based methods. ASTRAL is often more accurate than concatenation using maximum likelihood, except when ILS levels are low or there are too few gene trees. Availability and implementation: ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral. Contact: warnow@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                Conference
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central
                1471-2164
                2014
                17 October 2014
                : 15
                : Suppl 6
                : S11
                Affiliations
                [1 ]Department of Computer Science, University of Texas, Austin, TX, 78712, USA
                [2 ]Département d'Informatique, École Normale Supérieure, Paris, 75005, France
                [3 ]Department of Bioengineering, The University of Illinois, Urbana, IL, 61801, USA
                Article
                1471-2164-15-S6-S11
                10.1186/1471-2164-15-S6-S11
                4239591
                adfb9849-bf8e-4f11-9c36-eb906180be23
                Copyright © 2014 Zimmermann et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                Twelfth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics
                Cold Spring Harbor, NY, USA
                19-22 October 2014
                History
                Categories
                Research

                Genetics
                multi-species coalescent,phylogenomics,incomplete lineage sorting,binning
                Genetics
                multi-species coalescent, phylogenomics, incomplete lineage sorting, binning

                Comments

                Comment on this article