RAxML-Light: a tool for computing terabyte phylogenies

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation: Due to advances in molecular sequencing and the increasingly rapid collection of molecular data, the field of phyloinformatics is transforming into a computational science. Therefore, new tools are required that can be deployed in supercomputing environments and that scale to hundreds or thousands of cores.

Results: We describe RAxML-Light, a tool for large-scale phylogenetic inference on supercomputers under maximum likelihood. It implements a light-weight checkpointing mechanism, deploys 128-bit (SSE3) and 256-bit (AVX) vector intrinsics, offers two orthogonal memory saving techniques and provides a fine-grain production-level message passing interface parallelization of the likelihood function. To demonstrate scalability and robustness of the code, we inferred a phylogeny on a simulated DNA alignment (1481 taxa, 20 000 000 bp) using 672 cores. This dataset requires one terabyte of RAM to compute the likelihood score on a single tree.

Code Availability: https://github.com/stamatak/RAxML-Light-1.0.5

Data Availability: http://www.exelixis-lab.org/onLineMaterial.tar.bz2

Contact: alexandros.stamatakis@ 123456h-its.org

Supplementary Information: Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 8

Record: found
Abstract: found
Article: not found

Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.

A Rambaut, N. C. Grassly (1997)

Seq-Gen is a program that will simulate the evolution of nucleotide sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented, including the general reversible model. Nucleotide frequencies and other parameters of the model may be given and site-specific rate heterogeneity can also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus, large sets of replicate simulations can be easily created. This can be used to test phylogenetic hypotheses using the parametric bootstrap. Seq-Gen can be obtained by WWW from http:/(/)evolve.zoo.ox.ac.uk/Seq-Gen/seq-gen.html++ + or by FTP from ftp:/(/)evolve.zoo.ox.ac.uk/packages/Seq-Gen/. The package includes the source code, manual and example files. An Apple Macintosh version is available from the same sites.

0 comments Cited 247 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Book: not found

Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion

D. J. Zwickl, DJ Zwickl, D. Zwickl … (2006)

0 comments Cited 36 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

Fernando Izquierdo-Carrasco, Stephen A. Smith, Alexandros Stamatakis (2011)

Background The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood. Results We introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood) tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense) for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728) that can reduce execution times and memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems. Conclusions We address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code.

0 comments Cited 25 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Journal ID (hwp): bioinfo

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date (Print): 1 August 2012

Publication date (Electronic): 24 May 2012

Publication date PMC-release: 24 May 2012

Volume: 28

Issue: 15

Pages: 2064-2066

Affiliations

¹The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-68159 Heidelberg, Germany and ²Blackrim Lab, Department of Ecology and Evolutionary Biology, University of Michigan, 2071A Kraus Natural Science Building, 830 North University Ann Arbor, MI 48109-1048, USA

Author notes

* To whom correspondence should be addressed.

Associate Editor: Jonathan Wren

Article

Publisher ID: bts309

DOI: 10.1093/bioinformatics/bts309

PMC ID: 3400957

PubMed ID: 22628519

SO-VID: 50985ba5-bb32-40c9-8011-51d65746d889

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 7 March 2012

Date revision received : 15 May 2012

Date accepted : 18 May 2012

Page count

Pages: 3

Comments

Comment on this article

scite_

Cited by 52

See all cited by

Most referenced authors 196

See all reference authors

RAxML-Light: a tool for computing terabyte phylogenies

Read this article at

Abstract

Related collections

Genetoberfest

Most cited references 8

Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.

Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion

Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 74

Cited by 52

Most referenced authors 196