8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Shotgun metagenomic sequencing reveals the potential in microbial communities. However, lower-cost 16S ribosomal RNA (rRNA) gene sequencing provides taxonomic, not functional, observations. To remedy this, we previously introduced Piphillin, a software package that predicts functional metagenomic content based on the frequency of detected 16S rRNA gene sequences corresponding to genomes in regularly updated, functionally annotated genome databases. Piphillin (and similar tools) have previously been evaluated on 16S rRNA data processed by the clustering of sequences into operational taxonomic units (OTUs). New techniques such as amplicon sequence variant error correction are in increased use, but it is unknown if these techniques perform better in metagenomic content prediction pipelines, or if they should be treated the same as OTU data in respect to optimal pipeline parameters.

          Results

          To evaluate the effect of 16S rRNA sequence analysis method (clustering sequences into OTUs vs amplicon sequence variant error correction into amplicon sequence variants (ASVs)) on the ability of Piphillin to predict functional metagenomic content, we evaluated Piphillin-predicted functional content from 16S rRNA sequence data processed through OTU clustering and error correction into ASVs compared to corresponding shotgun metagenomic data. We show a strong correlation between metagenomic data and Piphillin-predicted functional content resulting from both 16S rRNA sequence analysis methods. Differential abundance testing with Piphillin-predicted functional content exhibited a low false positive rate (< 0.05) while capturing a large fraction of the differentially abundant features resulting from corresponding metagenomic data. However, Piphillin prediction performance was optimal at different cutoff parameters depending on 16S rRNA sequence analysis method. Using data analyzed with amplicon sequence variant error correction, Piphillin outperformed comparable tools, for instance exhibiting 19% greater balanced accuracy and 54% greater precision compared to PICRUSt2.

          Conclusions

          Our results demonstrate that raw Illumina sequences should be processed for subsequent Piphillin analysis using amplicon sequence variant error correction (with DADA2 or similar methods) and run using a 99% ID cutoff for Piphillin, while sequences generated on platforms other than Illumina should be processed via OTU clustering (e.g., UPARSE) and run using a 96% ID cutoff for Piphillin. Piphillin is publicly available for academic users (Piphillin server. http://piphillin.secondgenome.com/.)

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences

          Abstract Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$1$\end{document} billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$7$\end{document} h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$30$\end{document} in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng .
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Efficient comparative phylogenetics on large trees

            Biodiversity databases now comprise hundreds of thousands of sequences and trait records. For example, the Open Tree of Life includes over 1 491 000 metazoan and over 300 000 bacterial taxa. These data provide unique opportunities for analysis of phylogenetic trait distribution and reconstruction of ancestral biodiversity. However, existing tools for comparative phylogenetics scale poorly to such large trees, to the point of being almost unusable.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Changes in Abundance of Oral Microbiota Associated with Oral Cancer

              Individual bacteria and shifts in the composition of the microbiome have been associated with human diseases including cancer. To investigate changes in the microbiome associated with oral cancers, we profiled cancers and anatomically matched contralateral normal tissue from the same patient by sequencing 16S rDNA hypervariable region amplicons. In cancer samples from both a discovery and a subsequent confirmation cohort, abundance of Firmicutes (especially Streptococcus) and Actinobacteria (especially Rothia) was significantly decreased relative to contralateral normal samples from the same patient. Significant decreases in abundance of these phyla were observed for pre-cancers, but not when comparing samples from contralateral sites (tongue and floor of mouth) from healthy individuals. Weighted UniFrac principal coordinates analysis based on 12 taxa separated most cancers from other samples with greatest separation of node positive cases. These studies begin to develop a framework for exploiting the oral microbiome for monitoring oral cancer development, progression and recurrence.
                Bookmark

                Author and article information

                Contributors
                nicole@secondgenome.com
                thomas@secondgenome.com
                ejlaserna@gmail.com
                M.Claesson@ucc.ie
                f.shanahan@ucc.ie
                karim@secondgenome.com
                shoko@secondgenome.com
                todd@secondgenome.com
                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                17 January 2020
                17 January 2020
                2020
                : 21
                : 56
                Affiliations
                [1 ]GRID grid.452682.f, Informatics Department, Second Genome Inc., ; South San Francisco, California, USA
                [2 ]ISNI 0000000123318773, GRID grid.7872.a, APC Microbiome Ireland, , University College Cork, Co., ; Cork, Ireland
                [3 ]ISNI 0000000123318773, GRID grid.7872.a, School of Microbiology, , University College Cork, Co., ; Cork, Ireland
                [4 ]ISNI 0000000123318773, GRID grid.7872.a, Department of Medicine, , University College Cork, Co., ; Cork, Ireland
                Article
                6427
                10.1186/s12864-019-6427-1
                6967091
                31952477
                147fde1d-52b9-4b4d-b2b3-c1143134da8d
                © The Author(s). 2020

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 12 June 2019
                : 24 December 2019
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2020

                Genetics
                metagenomics,phylogenetic analysis,sequence alignment,shotgun sequencing,genomic databases

                Comments

                Comment on this article