26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      RAxML-Light: a tool for computing terabyte phylogenies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: Due to advances in molecular sequencing and the increasingly rapid collection of molecular data, the field of phyloinformatics is transforming into a computational science. Therefore, new tools are required that can be deployed in supercomputing environments and that scale to hundreds or thousands of cores.

          Results: We describe RAxML-Light, a tool for large-scale phylogenetic inference on supercomputers under maximum likelihood. It implements a light-weight checkpointing mechanism, deploys 128-bit (SSE3) and 256-bit (AVX) vector intrinsics, offers two orthogonal memory saving techniques and provides a fine-grain production-level message passing interface parallelization of the likelihood function. To demonstrate scalability and robustness of the code, we inferred a phylogeny on a simulated DNA alignment (1481 taxa, 20 000 000 bp) using 672 cores. This dataset requires one terabyte of RAM to compute the likelihood score on a single tree.

          Code Availability: https://github.com/stamatak/RAxML-Light-1.0.5

          Data Availability: http://www.exelixis-lab.org/onLineMaterial.tar.bz2

          Contact: alexandros.stamatakis@ 123456h-its.org

          Supplementary Information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: not found

          Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.

          Seq-Gen is a program that will simulate the evolution of nucleotide sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented, including the general reversible model. Nucleotide frequencies and other parameters of the model may be given and site-specific rate heterogeneity can also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus, large sets of replicate simulations can be easily created. This can be used to test phylogenetic hypotheses using the parametric bootstrap. Seq-Gen can be obtained by WWW from http:/(/)evolve.zoo.ox.ac.uk/Seq-Gen/seq-gen.html++ + or by FTP from ftp:/(/)evolve.zoo.ox.ac.uk/packages/Seq-Gen/. The package includes the source code, manual and example files. An Apple Macintosh version is available from the same sites.
            Bookmark
            • Record: found
            • Abstract: not found
            • Book: not found

            Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

              Background The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood. Results We introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood) tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense) for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728) that can reduce execution times and memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems. Conclusions We address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                1 August 2012
                24 May 2012
                24 May 2012
                : 28
                : 15
                : 2064-2066
                Affiliations
                1The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-68159 Heidelberg, Germany and 2Blackrim Lab, Department of Ecology and Evolutionary Biology, University of Michigan, 2071A Kraus Natural Science Building, 830 North University Ann Arbor, MI 48109-1048, USA
                Author notes
                * To whom correspondence should be addressed.

                Associate Editor: Jonathan Wren

                Article
                bts309
                10.1093/bioinformatics/bts309
                3400957
                22628519
                50985ba5-bb32-40c9-8011-51d65746d889
                © The Author(s) 2012. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 7 March 2012
                : 15 May 2012
                : 18 May 2012
                Page count
                Pages: 3
                Categories
                Applications Note
                Phylogenetics

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article