17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      An investigation of irreproducibility in maximum likelihood phylogenetic inference

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Phylogenetic trees are essential for studying biology, but their reproducibility under identical parameter settings remains unexplored. Here, we find that 3515 (18.11%) IQ-TREE-inferred and 1813 (9.34%) RAxML-NG-inferred maximum likelihood (ML) gene trees are topologically irreproducible when executing two replicates (Run1 and Run2) for each of 19,414 gene alignments in 15 animal, plant, and fungal phylogenomic datasets. Notably, coalescent-based ASTRAL species phylogenies inferred from Run1 and Run2 sets of individual gene trees are topologically irreproducible for 9/15 phylogenomic datasets, whereas concatenation-based phylogenies inferred twice from the same supermatrix are reproducible. Our simulations further show that irreproducible phylogenies are more likely to be incorrect than reproducible phylogenies. These results suggest that a considerable fraction of single-gene ML trees may be irreproducible. Increasing reproducibility in ML inference will benefit from providing analyses’ log files, which contain typically reported parameters (e.g., program, substitution model, number of tree searches) but also typically unreported ones (e.g., random starting seed number, number of threads, processor type).

          Abstract

          Replicate runs of maximum likelihood phylogenetic analyses can generate different tree topologies due to differences in parameters, such as random seeds. Here, Shen et al. demonstrate that replicate runs can generate substantially different tree topologies even with identical data and parameters.

          Related collections

          Most cited references64

          • Record: found
          • Abstract: found
          • Article: not found

          MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

          The Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

            Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: alexandros.stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

              Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%.
                Bookmark

                Author and article information

                Contributors
                xingxingshen@zju.edu.cn
                antonis.rokas@vanderbilt.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                30 November 2020
                30 November 2020
                2020
                : 11
                : 6096
                Affiliations
                [1 ]GRID grid.13402.34, ISNI 0000 0004 1759 700X, State Key Laboratory of Rice Biology, Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, ; 310058 Hangzhou, China
                [2 ]GRID grid.13402.34, ISNI 0000 0004 1759 700X, Institute of Insect Sciences, , Zhejiang University, ; 310058 Hangzhou, China
                [3 ]GRID grid.152326.1, ISNI 0000 0001 2264 7217, Department of Biological Sciences, , Vanderbilt University, ; Nashville, TN 37235 USA
                [4 ]GRID grid.14003.36, ISNI 0000 0001 2167 3675, Laboratory of Genetics, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, Center for Genomic Science Innovation, University of Wisconsin-Madison, ; Madison, WI 53706 USA
                [5 ]GRID grid.14003.36, ISNI 0000 0001 2167 3675, DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, ; Madison, WI 53706 USA
                Author information
                http://orcid.org/0000-0001-5765-1419
                http://orcid.org/0000-0002-2206-5804
                http://orcid.org/0000-0001-5088-7461
                http://orcid.org/0000-0002-9109-8853
                http://orcid.org/0000-0002-7248-6551
                Article
                20005
                10.1038/s41467-020-20005-6
                7705714
                33257660
                880f1f74-18e4-4098-abc2-fe1e08c2d582
                © The Author(s) 2020

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 24 June 2020
                : 5 November 2020
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100001809, National Natural Science Foundation of China (National Science Foundation of China);
                Award ID: 32071665
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100000001, National Science Foundation (NSF);
                Award ID: DEB-1442113
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100006492, Division of Intramural Research, National Institute of Allergy and Infectious Diseases (Division of Intramural Research of the NIAID);
                Award ID: 1R56AI146096-01A1
                Award Recipient :
                Categories
                Article
                Custom metadata
                © The Author(s) 2020

                Uncategorized
                phylogeny,phylogenetics
                Uncategorized
                phylogeny, phylogenetics

                Comments

                Comment on this article