45
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA- NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA- NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA- NG, we placed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$1$\end{document} billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$7$\end{document} h, using 2048 cores. Our performance assessment shows that EPA- NG outperforms RAxML-EPA and PPLACER by up to a factor of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$30$\end{document} in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA- NG scales well up to 2048 cores. EPA- NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng .

          Related collections

          Most cited references9

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria

          Background Bacterial vaginosis (BV) is a common condition that is associated with numerous adverse health outcomes and is characterized by poorly understood changes in the vaginal microbiota. We sought to describe the composition and diversity of the vaginal bacterial biota in women with BV using deep sequencing of the 16S rRNA gene coupled with species-level taxonomic identification. We investigated the associations between the presence of individual bacterial species and clinical diagnostic characteristics of BV. Methodology/Principal Findings Broad-range 16S rRNA gene PCR and pyrosequencing were performed on vaginal swabs from 220 women with and without BV. BV was assessed by Amsel’s clinical criteria and confirmed by Gram stain. Taxonomic classification was performed using phylogenetic placement tools that assigned 99% of query sequence reads to the species level. Women with BV had heterogeneous vaginal bacterial communities that were usually not dominated by a single taxon. In the absence of BV, vaginal bacterial communities were dominated by either Lactobacillus crispatus or Lactobacillus iners. Leptotrichia amnionii and Eggerthella sp. were the only two BV-associated bacteria (BVABs) significantly associated with each of the four Amsel’s criteria. Co-occurrence analysis revealed the presence of several sub-groups of BVABs suggesting metabolic co-dependencies. Greater abundance of several BVABs was observed in Black women without BV. Conclusions/Significance The human vaginal bacterial biota is heterogeneous and marked by greater species richness and diversity in women with BV; no species is universally present. Different bacterial species have different associations with the four clinical criteria, which may account for discrepancies often observed between Amsel and Nugent (Gram stain) diagnostic criteria. Several BVABs exhibited race-dependent prevalence when analyzed in separate groups by BV status which may contribute to increased incidence of BV in Black women. Tools developed in this project can be used to study microbial ecology in diverse settings at high resolution.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Detection of gene pathways with predictive power for breast cancer prognosis

            Background Prognosis is of critical interest in breast cancer research. Biomedical studies suggest that genomic measurements may have independent predictive power for prognosis. Gene profiling studies have been conducted to search for predictive genomic measurements. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated functions. The goal of this study is to identify gene pathways with predictive power for breast cancer prognosis. Since our goal is fundamentally different from that of existing studies, a new pathway analysis method is proposed. Results The new method advances beyond existing alternatives along the following aspects. First, it can assess the predictive power of gene pathways, whereas existing methods tend to focus on model fitting accuracy only. Second, it can account for the joint effects of multiple genes in a pathway, whereas existing methods tend to focus on the marginal effects of genes. Third, it can accommodate multiple heterogeneous datasets, whereas existing methods analyze a single dataset only. We analyze four breast cancer prognosis studies and identify 97 pathways with significant predictive power for prognosis. Important pathways missed by alternative methods are identified. Conclusions The proposed method provides a useful alternative to existing pathway analysis methods. Identified pathways can provide further insights into breast cancer prognosis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood

              We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Syst Biol
                Syst. Biol
                sysbio
                Systematic Biology
                Oxford University Press
                1063-5157
                1076-836X
                March 2019
                21 September 2018
                21 September 2018
                : 68
                : 2
                : 365-369
                Affiliations
                [1 ]Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
                [2 ]Department of Computer Engineering, University of A Coruña, 15071 A Coruña, Spain
                [3 ]Department of Genetics, Evolution and Environment,University College London, Gower St., Bloomsbury, London WC1E 6BT, UK
                [4 ]Karlsruhe Institute of Technology, Department of Informatics, Institute of Theoretical Informatics, Postfach 6980, 76128 Karlsruhe, Germany
                Author notes
                Correspondence to be sent to: Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany; E-mail: pierre.barbera@ 123456h-its.org .
                Article
                syy054
                10.1093/sysbio/syy054
                6368480
                30165689
                efa56025-8c02-469f-ac64-393c9415b589
                © The Author(s) 2018. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contactjournals.permissions@ 123456oup.com

                History
                : 30 March 2018
                : 21 August 2018
                : 21 August 2018
                Page count
                Pages: 5
                Funding
                Funded by: Klaus Tschira Stiftung gGmbH in Heidelberg, Germany
                Categories
                Software for Systematics and Evolution

                Animal science & Zoology
                metabarcoding,metagenomics,microbiome,phylogenetics,phylogenetic placement

                Comments

                Comment on this article