30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A scalable method for identifying frequent subtrees in sets of large phylogenetic trees

      research-article
      1 , 2 , , 3
      BMC Bioinformatics
      BioMed Central
      Phylogenetic trees, Frequent subtree

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees.

          Results

          We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods.

          Conclusions

          Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Newick utilities: high-throughput phylogenetic tree processing in the Unix shell

          Summary: We present a suite of Unix shell programs for processing any number of phylogenetic trees of any size. They perform frequently-used tree operations without requiring user interaction. They also allow tree drawing as scalable vector graphics (SVG), suitable for high-quality presentations and further editing, and as ASCII graphics for command-line inspection. As an example we include an implementation of bootscanning, a procedure for finding recombination breakpoints in viral genomes. Availability: C source code, Python bindings and executables for various platforms are available from http://cegg.unige.ch/newick_utils. The distribution includes a manual and example data. The package is distributed under the BSD License. Contact: thomas.junier@unige.ch
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Accommodating phylogenetic uncertainty in evolutionary studies.

            Many evolutionary studies use comparisons across species to detect evidence of natural selection and to examine the rate of character evolution. Statistical analyses in these studies are usually performed by means of a species phylogeny to accommodate the effects of shared evolutionary history. The phylogeny is usually treated as known without error; this assumption is problematic because inferred phylogenies are subject to both stochastic and systematic errors. We describe methods for accommodating phylogenetic uncertainty in evolutionary studies by means of Bayesian inference. The methods are computationally intensive but general enough to be applied in most comparative evolutionary studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Understanding angiosperm diversification using small and large phylogenetic trees.

              How will the emerging possibility of inferring ultra-large phylogenies influence our ability to identify shifts in diversification rate? For several large angiosperm clades (Angiospermae, Monocotyledonae, Orchidaceae, Poaceae, Eudicotyledonae, Fabaceae, and Asteraceae), we explore this issue by contrasting two approaches: (1) using small backbone trees with an inferred number of extant species assigned to each terminal clade and (2) using a mega-phylogeny of 55473 seed plant species represented in GenBank. The mega-phylogeny approach assumes that the sample of species in GenBank is at least roughly proportional to the actual species diversity of different lineages, as appears to be the case for many major angiosperm lineages. Using both approaches, we found that diversification rate shifts are not directly associated with the major named clades examined here, with the sole exception of Fabaceae in the GenBank mega-phylogeny. These agreements are encouraging and may support a generality about angiosperm evolution: major shifts in diversification may not be directly associated with major named clades, but rather with clades that are nested not far within these groups. An alternative explanation is that there have been increased extinction rates in early-diverging lineages within these clades. Based on our mega-phylogeny, the shifts in diversification appear to be distributed quite evenly throughout the angiosperms. Mega-phylogenetic studies of diversification hold great promise for revealing new patterns, but we will need to focus more attention on properly specifying null expectation.
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2012
                3 October 2012
                : 13
                : 256
                Affiliations
                [1 ]Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA
                [2 ]Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
                [3 ]Department of Biology, University of Florida, Gainesville, FL, USA
                Article
                1471-2105-13-256
                10.1186/1471-2105-13-256
                3543182
                23033843
                040277ef-1462-4c5c-b310-91ca7d6f4264
                Copyright ©2012 Ramu et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 5 February 2012
                : 5 September 2012
                Categories
                Research Article

                Bioinformatics & Computational biology
                frequent subtree,phylogenetic trees
                Bioinformatics & Computational biology
                frequent subtree, phylogenetic trees

                Comments

                Comment on this article