Maximum likelihood pandemic-scale phylogenetics

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Summary

Phylogenetics plays a crucial role in the interpretation of genomic data ¹. Phylogenetic analyses of SARS-CoV-2 genomes have allowed the detailed study of the virus’s origins ², of its international ^{3,
4} and local ^{4–
9} spread, and of the emergence ¹⁰ and reproductive success ¹¹ of new variants, among many applications. These analyses have been enabled by the unparalleled volumes of genome sequence data generated and employed to study and help contain the pandemic ¹². However, preferred model-based phylogenetic approaches including maximum likelihood and Bayesian methods, mostly based on Felsenstein’s ‘pruning’ algorithm ^{13,
14}, cannot scale to the size of the datasets from the current pandemic ^{4,
15}, hampering our understanding of the virus’s evolution and transmission ¹⁶. We present new approaches, based on reworking Felsenstein’s algorithm, for likelihood-based phylogenetic analysis of epidemiological genomic datasets at unprecedented scales. We exploit near-certainty regarding ancestral genomes, and the similarities between closely related and densely sampled genomes, to greatly reduce computational demands for memory and time. Combined with new methods for searching amongst candidate evolutionary trees, this results in our MAPLE (‘MAximum Parsimonious Likelihood Estimation’) software giving better results than popular approaches such as FastTree 2 ¹⁷, IQ-TREE 2 ¹⁸, RAxML-NG ¹⁹ and UShER ¹⁵. Our approach therefore allows complex and accurate probabilistic phylogenetic analyses of millions of microbial genomes, extending the reach of genomic epidemiology. Future epidemiological datasets are likely to be even larger than those currently associated with COVID-19, and other disciplines such as metagenomics and biodiversity science are also generating huge numbers of genome sequences ^{20–
22}. Our methods will permit continued use of preferred likelihood-based phylogenetic analyses.

Related collections

Most cited references 58

Record: found
Abstract: found
Article: found

Is Open Access

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

Alexandros Stamatakis (2014)

Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: alexandros.stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 7414 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

A new coronavirus associated with human respiratory disease in China

Fan Wu, Su Zhao, Bin Yu … (2020)

Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health 1–3 . Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing 4 of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China 5 . This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans.

0 comments Cited 5332 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

Morgan N. Price, Paramvir S Dehal, Adam Arkin (2010)

Background We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the “CAT” approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. Conclusions/Significance FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at http://www.microbesonline.org/fasttree.

0 comments Cited 3271 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): bioRxiv

Journal ID (publisher-id): BIORXIV

Title: bioRxiv

Publisher: Cold Spring Harbor Laboratory

Publication date (Electronic): 18 July 2022

Electronic Location Identifier: 2022.03.22.485312

Affiliations

[1 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK

[2 ]Max Planck Institute for Molecular Genetics, Ihnestraße 63-73 14195 Berlin, Germany

[3 ]Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA

[4 ]Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA

[5 ]Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA

[6 ]School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, ACT 2600, Australia

Author notes

Author contributions

N.D.M. conceived and implemented the methods, performed the simulations and real data analyses, and wrote the first draft of the manuscript.

N.G. supervised the work and finalized the manuscript.

B.Q.M., R.C.-D., Y.T. and P.K. provided support during the analyses, method implementation and the drafting of the manuscript.

Correspondence and requests for materials should be addressed to Nicola De Maio.

Article

DOI: 10.1101/2022.03.22.485312

PMC ID: 8963701

PubMed ID: 35350209

SO-VID: 961c4f77-6310-482e-aa39-0c44f4d05621

License:

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

Maximum likelihood pandemic-scale phylogenetics

Read this article at

Summary

Related collections

Novel Coronavirus Disease COVID-19

Most cited references 58

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

A new coronavirus associated with human respiratory disease in China

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 28

Cited by 3

Most referenced authors 2,136