Phylogenomic analysis of Calyptratae: resolving the phylogenetic relationships within a major radiation of Diptera

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Related collections

Most cited references 72

Record: found
Abstract: found
Article: found

Is Open Access

ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

Siavash Mirarab, Tandy Warnow (2015)

Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed ‘bipartitions’. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent. Results: We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL’s running time is O ( n 2 k | X | 2 ) , and ASTRAL-II’s running time is O ( n k | X | 2 ) , where n is the number of species, k is the number of loci and X is the set of allowed bipartitions for the search space. Availability and implementation: ASTRAL-II is available in open source at https://github.com/smirarab/ASTRAL and datasets used are available at http://www.cs.utexas.edu/~phylo/datasets/astral2/. Contact: smirarab@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 389 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

How many bootstrap replicates are necessary?

Nicholas D Pattengale, Masoud Alipour, Olaf Bininda-Emonds … (2010)

Phylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on one of the most popular, maximum likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the relative accuracy of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1 to 2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this article, we propose stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and we report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA--single-gene as well as multi-gene--datasets, which include 125-2,554 taxa. We find that our stopping criteria typically stop computations after 100-500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus twofold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through BS, and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100-500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2, and BS with our stopping criteria is included in the latest release of RAxML v7.2.5, available at http://wwwkramer.in.tum.de/exelixis/software.html.

0 comments Cited 304 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Selecting optimal partitioning schemes for phylogenomic datasets

Robert Lanfear, Brett Calcott, David Kainer … (2014)

Background Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics. Methods We develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere. Results We compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores. Conclusions These two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets.

0 comments Cited 271 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Cladistics

Abbreviated Title: Cladistics

Publisher: Wiley

ISSN (Print): 0748-3007

ISSN (Electronic): 1096-0031

Publication date Created: March 22 2019

Publication date Created: December 2019

Publication date (Electronic): February 22 2019

Publication date (Print): December 2019

Volume: 35

Issue: 6

Pages: 605-622

Affiliations

[1 ]Department of Biological Sciences National University of Singapore 14 Science Dr 4 Singapore 117543 Singapore

[2 ]Biology I, Evolutionary Biology & Ecology University of Freiburg Hauptstraße 1 Freiburg (Brsg.) Germany

[3 ]Zoologisches Forschungsmuseum Alexander Koenig (ZFMK)/Zentrum für Molekulare Biodiversitätsforschung (ZMB) Bonn Germany

[4 ]Australian National Insect Collection CSIRO National Research Collections Australia (NRCA) Acton, ACT Canberra Australia

[5 ]Department of Entomology California Academy of Sciences San Francisco CA USA

[6 ]Department of Entomology North Carolina State University Raleigh NC 27695 USA

[7 ]Departamento de Ecologia, Zoologia e Genética Instituto de Biologia Universidade Federal de Pelotas Pelotas RS Brazil

[8 ]Oxford University Museum of Natural History Parks Road Oxford OX1 3PW UK

[9 ]Beijing Advanced Innovation Center for Food Nutrition and Human Health China Agricultural University Beijing 100193 China

[10 ]Department of Entomology China Agricultural University Beijing 100193 China

[11 ]Dipartimento di Biologia e Biotecnologie ‘Charles Darwin’ Sapienza Università di Roma Rome Italy

[12 ]Natural History Museum of Denmark University of Copenhagen Universitetsparken 15 Copenhagen DK–2100 Denmark

Article

DOI: 10.1111/cla.12375

PubMed ID: 34618931

SO-VID: 39e5cd1e-d372-4f94-a2a3-7b89cf9191e0

License:

http://onlinelibrary.wiley.com/termsAndConditions#am

http://onlinelibrary.wiley.com/termsAndConditions#vor

http://doi.wiley.com/10.1002/tdm_license_1.1

History

Data availability:

Comments

Comment on this article

scite_

Cited by 29

See all cited by

Most referenced authors 1,605

See all reference authors

Phylogenomic analysis of Calyptratae: resolving the phylogenetic relationships within a major radiation of Diptera

Read this article at

Related collections

Core Readings in Statistical Mediation Analysis

Most cited references 72

ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

How many bootstrap replicates are necessary?

Selecting optimal partitioning schemes for phylogenomic datasets

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 2,198

Cited by 29

Most referenced authors 1,605