24
views
0
recommends
+1 Recommend
0 collections
0
shares
• Record: found
• Abstract: found
• Article: found
Is Open Access

# ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

1 , * , 2

Bioinformatics

Oxford University Press

Bookmark
There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

### Abstract

Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed ‘bipartitions’. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent.

Results: We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL’s running time is $O ( n 2 k | X | 2 )$ , and ASTRAL-II’s running time is $O ( n k | X | 2 )$ , where n is the number of species, k is the number of loci and X is the set of allowed bipartitions for the search space.

Availability and implementation: ASTRAL-II is available in open source at https://github.com/smirarab/ASTRAL and datasets used are available at http://www.cs.utexas.edu/~phylo/datasets/astral2/.

Supplementary information: Supplementary data are available at Bioinformatics online.

### Most cited references16

• Record: found
• Abstract: found
• Article: found
Is Open Access

### ASTRAL: genome-scale coalescent-based species tree estimation

(2014)
Motivation: Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions. Results: We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding accuracy—improving on MP-EST and the population tree from BUCKy, two statistically consistent leading coalescent-based methods. ASTRAL is often more accurate than concatenation using maximum likelihood, except when ILS levels are low or there are too few gene trees. Availability and implementation: ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral. Contact: warnow@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Bookmark
• Record: found
• Abstract: found

### Is a new and general theory of molecular systematics emerging?

(2008)
Bookmark
• Record: found
• Abstract: found

### Inconsistency of phylogenetic estimates from concatenated data under coalescence.

(2007)
Although multiple gene sequences are becoming increasingly available for molecular phylogenetic inference, the analysis of such data has largely relied on inference methods designed for single genes. One of the common approaches to analyzing data from multiple genes is concatenation of the individual gene data to form a single supergene to which traditional phylogenetic inference procedures - e.g., maximum parsimony (MP) or maximum likelihood (ML) - are applied. Recent empirical studies have demonstrated that concatenation of sequences from multiple genes prior to phylogenetic analysis often results in inference of a single, well-supported phylogeny. Theoretical work, however, has shown that the coalescent can produce substantial variation in single-gene histories. Using simulation, we combine these ideas to examine the performance of the concatenation approach under conditions in which the coalescent produces a high level of discord among individual gene trees and show that it leads to statistically inconsistent estimation in this setting. Furthermore, use of the bootstrap to measure support for the inferred phylogeny can result in moderate to strong support for an incorrect tree under these conditions. These results highlight the importance of incorporating variation in gene histories into multilocus phylogenetics.
Bookmark

### Author and article information

###### Journal
Bioinformatics
Bioinformatics
bioinformatics
bioinfo
Bioinformatics
Oxford University Press
1367-4803
1367-4811
15 June 2015
10 June 2015
10 June 2015
: 31
: 12
: i44-i52
###### Affiliations
1Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA and 2Departments of Computer Science and Bioengineering, The University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA
###### Author notes
*To whom correspondence should be addressed.
###### Article
btv234
10.1093/bioinformatics/btv234
4765870
26072508