101
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A general species delimitation method with applications to phylogenetic placements

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: Sequence-based methods to delimit species are central to DNA taxonomy, microbial community surveys and DNA metabarcoding studies. Current approaches either rely on simple sequence similarity thresholds (OTU-picking) or on complex and compute-intensive evolutionary models. The OTU-picking methods scale well on large datasets, but the results are highly sensitive to the similarity threshold. Coalescent-based species delimitation approaches often rely on Bayesian statistics and Markov Chain Monte Carlo sampling, and can therefore only be applied to small datasets.

          Results: We introduce the Poisson tree processes (PTP) model to infer putative species boundaries on a given phylogenetic input tree. We also integrate PTP with our evolutionary placement algorithm (EPA-PTP) to count the number of species in phylogenetic placements. We compare our approaches with popular OTU-picking methods and the General Mixed Yule Coalescent (GMYC) model. For de novo species delimitation, the stand-alone PTP model generally outperforms GYMC as well as OTU-picking methods when evolutionary distances between species are small. PTP neither requires an ultrametric input tree nor a sequence similarity threshold as input. In the open reference species delimitation approach, EPA-PTP yields more accurate results than de novo species delimitation methods. Finally, EPA-PTP scales on large datasets because it relies on the parallel implementations of the EPA and RAxML, thereby allowing to delimit species in high-throughput sequencing data.

          Availability and implementation: The code is freely available at www.exelixis-lab.org/software.html.

          Contact: Alexandros.Stamatakis@ 123456h-its.org

          Supplementary information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references38

          • Record: found
          • Abstract: found
          • Article: not found

          Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.

          The ongoing revolution in high-throughput sequencing continues to democratize the ability of small groups of investigators to map the microbial component of the biosphere. In particular, the coevolution of new sequencing platforms and new software tools allows data acquisition and analysis on an unprecedented scale. Here we report the next stage in this coevolutionary arms race, using the Illumina GAIIx platform to sequence a diverse array of 25 environmental samples and three known "mock communities" at a depth averaging 3.1 million reads per sample. We demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature (notably, the saline/nonsaline split in environmental samples and the split between host-associated and free-living communities). We also demonstrate that 2,000 Illumina single-end reads are sufficient to recapture the same relationships among samples that we observe with the full dataset. The results thus open up the possibility of conducting large-scale studies analyzing thousands of samples simultaneously to survey microbial communities at an unprecedented spatial and temporal resolution.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Bioperl toolkit: Perl modules for the life sciences.

            The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The integrative future of taxonomy

              Background Taxonomy is the biological discipline that identifies, describes, classifies and names extant and extinct species and other taxa. Nowadays, species taxonomy is confronted with the challenge to fully incorporate new theory, methods and data from disciplines that study the origin, limits and evolution of species. Results Integrative taxonomy has been proposed as a framework to bring together these conceptual and methodological developments. Here we review perspectives for an integrative taxonomy that directly bear on what species are, how they can be discovered, and how much diversity is on Earth. Conclusions We conclude that taxonomy needs to be pluralistic to improve species discovery and description, and to develop novel protocols to produce the much-needed inventory of life in a reasonable time. To cope with the large number of candidate species revealed by molecular studies of eukaryotes, we propose a classification scheme for those units that will facilitate the subsequent assembly of data sets for the formal description of new species under the Linnaean system, and will ultimately integrate the activities of taxonomists and molecular biologists.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 November 2013
                29 August 2013
                29 August 2013
                : 29
                : 22
                : 2869-2876
                Affiliations
                1The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, D-68159 Heidelberg, Germany, 2Graduate School for Computing in Medicine and Life Sciences, University of Lübeck, 3Institut für Neuro- und Bioinformatik, University of Lübeck, 23538 Lübeck, Germany, 4Natural History Museum of Crete, University of Crete, GR-71409 Irakleio, Crete, Greece and 5Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas-FORTH, GR-70013 Heraklion, Crete, Greece
                Author notes
                *To whom correspondence should be addressed.

                Associate Editor: David Posada

                Article
                btt499
                10.1093/bioinformatics/btt499
                3810850
                23990417
                d75a0b03-ed10-46e4-8f76-1ec347b9e31e
                © The Author 2013. Published by Oxford University Press. All rights reserved.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 9 April 2013
                : 20 August 2013
                : 21 August 2013
                Page count
                Pages: 8
                Categories
                Original Papers
                Phylogenetics

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article