20
views
0
recommends
+1 Recommend
0 collections
0
shares
• Record: found
• Abstract: found
• Article: found
Is Open Access

ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees

research-article
3 , 2 , 1 , 1 ,
BMC Bioinformatics
BioMed Central
RECOMB-CG - 2017 : The Fifteenth RECOMB Comparative Genomics Satellite Conference (RECOMB-CG 2017)
04-06 October 2017

Bookmark
There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Evolutionary histories can be discordant across the genome, and such discordances need to be considered in reconstructing the species phylogeny. ASTRAL is one of the leading methods for inferring species trees from gene trees while accounting for gene tree discordance. ASTRAL uses dynamic programming to search for the tree that shares the maximum number of quartet topologies with input gene trees, restricting itself to a predefined set of bipartitions.

Results

We introduce ASTRAL-III, which substantially improves the running time of ASTRAL-II and guarantees polynomial running time as a function of both the number of species ( n) and the number of genes ( k). ASTRAL-III limits the bipartition constraint set ( X) to grow at most linearly with n and k. Moreover, it handles polytomies more efficiently than ASTRAL-II, exploits similarities between gene trees better, and uses several techniques to avoid searching parts of the search space that are mathematically guaranteed not to include the optimal tree. The asymptotic running time of ASTRAL-III in the presence of polytomies is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O\left ((nk)^{1.726} D \right)$\end{document} where D= O( nk) is the sum of degrees of all unique nodes in input trees. The running time improvements enable us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations, we show that removing branches with very low support (e.g., below 10%) improves accuracy while overly aggressive filtering is harmful. We observe on a biological avian phylogenomic dataset of 14K genes that contracting low support branches greatly improve results.

Conclusions

ASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species. With ASTRAL-III, low support branches can be removed, resulting in improved accuracy.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2129-y) contains supplementary material, which is available to authorized users.

Most cited references34

• Record: found

Comparison of phylogenetic trees

(1981)
Bookmark
• Record: found

Gene Trees in Species Trees

(1997)
Bookmark
• Record: found
• Abstract: found
• Article: found
Is Open Access

ASTRAL: genome-scale coalescent-based species tree estimation

(2014)
Motivation: Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions. Results: We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding accuracy—improving on MP-EST and the population tree from BUCKy, two statistically consistent leading coalescent-based methods. ASTRAL is often more accurate than concatenation using maximum likelihood, except when ILS levels are low or there are too few gene trees. Availability and implementation: ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral. Contact: warnow@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Bookmark

Author and article information

Contributors
chz069@ucsd.edu
mrabieeh@ucsd.edu
esayyari@ucsd.edu
smirarab@ucsd.edu
Conference
BMC Bioinformatics
BMC Bioinformatics
BMC Bioinformatics
BioMed Central (London )
1471-2105
8 May 2018
8 May 2018
2018
: 19
Issue : Suppl 6 Issue sponsor : Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. JM is a co-author of one of the papers published in this supplement, review of his paper was organised by LN.
Affiliations
[1 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Department of Electrical and Computer Engineering, , University of California at San Diego, ; 9500 Gilman Drive, La Jolla, 92093-0021 CA USA
[2 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Department of Computer Science and Engineering, , University of California at San Diego, ; 9500 Gilman Drive, La Jolla, 92093-0021 CA USA
[3 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Bioinformatics and Systems Biology, University of California at San Diego, ; 9500 Gilman Drive, La Jolla, 92093-0021 CA USA
Article
2129
10.1186/s12859-018-2129-y
5998893
29745866
db760f77-ab59-492f-bc77-a015303421fb

RECOMB-CG - 2017 : The Fifteenth RECOMB Comparative Genomics Satellite Conference
RECOMB-CG 2017
Barcelona, Spain
04-06 October 2017
Categories
Research