Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

1 , 2 , * , 1 , 2 , 1 , 2 , 3

PLoS ONE

Public Library of Science

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Background

      We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability.

      Methodology/Principal Findings

      Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the “CAT” approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory.

      Conclusions/Significance

      FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at http://www.microbesonline.org/fasttree.

      Related collections

      Most cited references 34

      • Record: found
      • Abstract: found
      • Article: not found

      The neighbor-joining method: a new method for reconstructing phylogenetic trees.

       N Saitou,  M Nei (1987)
      A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

        The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/.
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

          RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Gamma yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets > or =4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25,057 (1463 bp) and 2182 (51,089 bp) taxa, respectively. icwww.epfl.ch/~stamatak
            Bookmark

            Author and article information

            Affiliations
            [1 ]Physical Biosciences Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
            [2 ]Virtual Institute of Microbial Stress and Survival, Lawrence Berkeley National Lab, Berkeley, California, United States of America
            [3 ]Department of Bioengineering, University of California, Berkeley, California, United States of America
            Providence Health Care, Canada
            Author notes

            Conceived and designed the experiments: MNP PD. Performed the experiments: MNP. Analyzed the data: MNP. Wrote the paper: MNP PD APA.

            Contributors
            Role: Editor
            Journal
            PLoS One
            plos
            plosone
            PLoS ONE
            Public Library of Science (San Francisco, USA )
            1932-6203
            2010
            10 March 2010
            : 5
            : 3
            2835736
            20224823
            09-PONE-RA-14469R1
            10.1371/journal.pone.0009490
            (Editor)
            Price et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
            Counts
            Pages: 10
            Categories
            Research Article
            Computational Biology/Macromolecular Sequence Analysis
            Evolutionary Biology/Bioinformatics
            Molecular Biology/Molecular Evolution

            Uncategorized

            Comments

            Comment on this article