18
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Maximum likelihood pandemic-scale phylogenetics

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Phylogenetics has a crucial role in genomic epidemiology. Enabled by unparalleled volumes of genome sequence data generated to study and help contain the COVID-19 pandemic, phylogenetic analyses of SARS-CoV-2 genomes have shed light on the virus’s origins, spread, and the emergence and reproductive success of new variants. However, most phylogenetic approaches, including maximum likelihood and Bayesian methods, cannot scale to the size of the datasets from the current pandemic. We present ‘MAximum Parsimonious Likelihood Estimation’ (MAPLE), an approach for likelihood-based phylogenetic analysis of epidemiological genomic datasets at unprecedented scales. MAPLE infers SARS-CoV-2 phylogenies more accurately than existing maximum likelihood approaches while running up to thousands of times faster, and requiring at least 100 times less memory on large datasets. This extends the reach of genomic epidemiology, allowing the continued use of accurate phylogenetic, phylogeographic and phylodynamic analyses on datasets of millions of genomes.

          Abstract

          ‘MAximum Parsimonious Likelihood Estimation’ (MAPLE) is a maximum likelihood-based approach for inference of phylogenetic trees from very large datasets of similar sequences incorporating a sparse alignment representation and parsimony-based approximations, offering higher accuracy and reduced computational requirements.

          Related collections

          Most cited references57

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

          Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: alexandros.stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A new coronavirus associated with human respiratory disease in China

            Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health 1–3 . Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing 4 of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China 5 . This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

              Background We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the “CAT” approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. Conclusions/Significance FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at http://www.microbesonline.org/fasttree.
                Bookmark

                Author and article information

                Contributors
                demaio@ebi.ac.uk
                Journal
                Nat Genet
                Nat Genet
                Nature Genetics
                Nature Publishing Group US (New York )
                1061-4036
                1546-1718
                10 April 2023
                10 April 2023
                2023
                : 55
                : 5
                : 746-752
                Affiliations
                [1 ]GRID grid.225360.0, ISNI 0000 0000 9709 7726, European Molecular Biology Laboratory, , European Bioinformatics Institute (EMBL-EBI), ; Hinxton, UK
                [2 ]GRID grid.419538.2, ISNI 0000 0000 9071 0620, Max Planck Institute for Molecular Genetics, ; Berlin, Germany
                [3 ]GRID grid.266100.3, ISNI 0000 0001 2107 4242, Department of Electrical and Computer Engineering, , University of California San Diego, ; San Diego, CA USA
                [4 ]GRID grid.205975.c, ISNI 0000 0001 0740 6917, Department of Biomolecular Engineering, , University of California Santa Cruz, ; Santa Cruz, CA USA
                [5 ]GRID grid.205975.c, ISNI 0000 0001 0740 6917, Genomics Institute, , University of California Santa Cruz, ; Santa Cruz, CA USA
                [6 ]GRID grid.1001.0, ISNI 0000 0001 2180 7477, School of Computing, College of Engineering, Computing and Cybernetics, , Australian National University, ; Canberra, Australian Capital Territory Australia
                Author information
                http://orcid.org/0000-0002-1776-8564
                http://orcid.org/0000-0001-8486-2211
                Article
                1368
                10.1038/s41588-023-01368-0
                10181937
                37038003
                3e097244-ad0c-4b0e-92f2-8edd8ea6e6c0
                © The Author(s) 2023

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 19 August 2022
                : 7 March 2023
                Funding
                Funded by: FundRef https://doi.org/10.13039/100013060, European Molecular Biology Laboratory (EMBL Heidelberg);
                Funded by: FundRef https://doi.org/10.13039/100000030, U.S. Department of Health & Human Services | Centers for Disease Control and Prevention (CDC);
                Award ID: BAA 200-2021-11554
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100000879, Alfred P. Sloan Foundation;
                Categories
                Article
                Custom metadata
                © Springer Nature America, Inc. 2023

                Genetics
                genome informatics,microbial genetics,genomics,molecular biology,software
                Genetics
                genome informatics, microbial genetics, genomics, molecular biology, software

                Comments

                Comment on this article