114
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data

      research-article
      * ,
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: The current molecular data explosion poses new challenges for large-scale phylogenomic analyses that can comprise hundreds or even thousands of genes. A property that characterizes phylogenomic datasets is that they tend to be gappy, i.e. can contain taxa with (many and disparate) missing genes. In current phylogenomic analyses, this type of alignment gappyness that is induced by missing data frequently exceeds 90%. We present and implement a generally applicable mechanism that allows for reducing memory footprints of likelihood-based [maximum likelihood (ML) or Bayesian] phylogenomic analyses proportional to the amount of missing data in the alignment. We also introduce a set of algorithmic rules to efficiently conduct tree searches via subtree pruning and re-grafting moves using this mechanism.

          Results: On a large phylogenomic DNA dataset with 2177 taxa, 68 genes and a gappyness of 90%, we achieve a memory footprint reduction from 9 GB down to 1 GB, a speedup for optimizing ML model parameters of 11, and accelerate the Subtree Pruning Regrafting tree search phase by factor 16. Thus, our approach can be deployed to improve efficiency for the two most important resources, CPU time and memory, by up to one order of magnitude.

          Availability: Current open-source version of RAxML v7.2.6 available at http://wwwkramer.in.tum.de/exelixis/software.html.

          Contact: stamatak@ 123456cs.tum.edu

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          Assessing the root of bilaterian animals with scalable phylogenomic methods.

          A clear picture of animal relationships is a prerequisite to understand how the morphological and ecological diversity of animals evolved over time. Among others, the placement of the acoelomorph flatworms, Acoela and Nemertodermatida, has fundamental implications for the origin and evolution of various animal organ systems. Their position, however, has been inconsistent in phylogenetic studies using one or several genes. Furthermore, Acoela has been among the least stable taxa in recent animal phylogenomic analyses, which simultaneously examine many genes from many species, while Nemertodermatida has not been sampled in any phylogenomic study. New sequence data are presented here from organisms targeted for their instability or lack of representation in prior analyses, and are analysed in combination with other publicly available data. We also designed new automated explicit methods for identifying and selecting common genes across different species, and developed highly optimized supercomputing tools to reconstruct relationships from gene sequences. The results of the work corroborate several recently established findings about animal relationships and provide new support for the placement of other groups. These new data and methods strongly uphold previous suggestions that Acoelomorpha is sister clade to all other bilaterian animals, find diminishing evidence for the placement of the enigmatic Xenoturbella within Deuterostomia, and place Cycliophora with Entoprocta and Ectoprocta. The work highlights the implications that these arrangements have for metazoan evolution and permits a clearer picture of ancestral morphologies and life histories in the deep past.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Some probabilistic and statistical problems on the analysis of DNA sequence

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Many-core algorithms for statistical phylogenetics.

              Statistical phylogenetics is computationally intensive, resulting in considerable attention meted on techniques for parallelization. Codon-based models allow for independent rates of synonymous and replacement substitutions and have the potential to more adequately model the process of protein-coding sequence evolution with a resulting increase in phylogenetic accuracy. Unfortunately, due to the high number of codon states, computational burden has largely thwarted phylogenetic reconstruction under codon models, particularly at the genomic-scale. Here, we describe novel algorithms and methods for evaluating phylogenies under arbitrary molecular evolutionary models on graphics processing units (GPUs), making use of the large number of processing cores to efficiently parallelize calculations even for large state-size models. We implement the approach in an existing Bayesian framework and apply the algorithms to estimating the phylogeny of 62 complete mitochondrial genomes of carnivores under a 60-state codon model. We see a near 90-fold speed increase over an optimized CPU-based computation and a >140-fold increase over the currently available implementation, making this the first practical use of codon models for phylogenetic inference over whole mitochondrial or microorganism genomes. Source code provided in BEAGLE: Broad-platform Evolutionary Analysis General Likelihood Evaluator, a cross-platform/processor library for phylogenetic likelihood computation (http://beagle-lib.googlecode.com/). We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (http://beast.bio.ed.ac.uk/).
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 June 2010
                1 June 2010
                1 June 2010
                : 26
                : 12
                : i132-i139
                Affiliations
                The Exelixis Lab (I12), Department of Computer Science, Technische Universität München, Boltzmannstr. 3, D-85748, Garching b. München, Germany
                Author notes
                * To whom correspondence should be addressed.
                Article
                btq205
                10.1093/bioinformatics/btq205
                2881390
                20529898
                2eefbc01-095b-4544-a77d-3e2507b3fe08
                © The Author(s) 2010. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                Categories
                Ismb 2010 Conference Proceedings July 11 to July 13, 2010, Boston, Ma, Usa
                Original Papers
                Evolution and Comparative Genomics

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article