Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Shotgun metagenomic sequencing reveals the potential in microbial communities. However, lower-cost 16S ribosomal RNA (rRNA) gene sequencing provides taxonomic, not functional, observations. To remedy this, we previously introduced Piphillin, a software package that predicts functional metagenomic content based on the frequency of detected 16S rRNA gene sequences corresponding to genomes in regularly updated, functionally annotated genome databases. Piphillin (and similar tools) have previously been evaluated on 16S rRNA data processed by the clustering of sequences into operational taxonomic units (OTUs). New techniques such as amplicon sequence variant error correction are in increased use, but it is unknown if these techniques perform better in metagenomic content prediction pipelines, or if they should be treated the same as OTU data in respect to optimal pipeline parameters.

Results

To evaluate the effect of 16S rRNA sequence analysis method (clustering sequences into OTUs vs amplicon sequence variant error correction into amplicon sequence variants (ASVs)) on the ability of Piphillin to predict functional metagenomic content, we evaluated Piphillin-predicted functional content from 16S rRNA sequence data processed through OTU clustering and error correction into ASVs compared to corresponding shotgun metagenomic data. We show a strong correlation between metagenomic data and Piphillin-predicted functional content resulting from both 16S rRNA sequence analysis methods. Differential abundance testing with Piphillin-predicted functional content exhibited a low false positive rate (< 0.05) while capturing a large fraction of the differentially abundant features resulting from corresponding metagenomic data. However, Piphillin prediction performance was optimal at different cutoff parameters depending on 16S rRNA sequence analysis method. Using data analyzed with amplicon sequence variant error correction, Piphillin outperformed comparable tools, for instance exhibiting 19% greater balanced accuracy and 54% greater precision compared to PICRUSt2.

Conclusions

Our results demonstrate that raw Illumina sequences should be processed for subsequent Piphillin analysis using amplicon sequence variant error correction (with DADA2 or similar methods) and run using a 99% ID cutoff for Piphillin, while sequences generated on platforms other than Illumina should be processed via OTU clustering (e.g., UPARSE) and run using a 96% ID cutoff for Piphillin. Piphillin is publicly available for academic users (Piphillin server. http://piphillin.secondgenome.com/.)

Related collections

Most cited references 13

Record: found
Abstract: found
Article: found

Is Open Access

EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences

Pierre Barbera, Alexey M. Kozlov, Lucas Czech … (2018)

Abstract Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$1$\end{document} billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$7$\end{document} h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$30$\end{document} in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng .

0 comments Cited 171 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Efficient comparative phylogenetics on large trees

Alfonso Valencia, Michael Doebeli, Stilianos Louca (2018)

Biodiversity databases now comprise hundreds of thousands of sequences and trait records. For example, the Open Tree of Life includes over 1 491 000 metazoan and over 300 000 bacterial taxa. These data provide unique opportunities for analysis of phylogenetic trait distribution and reconstruction of ancestral biodiversity. However, existing tools for comparative phylogenetics scale poorly to such large trees, to the point of being almost unusable.

0 comments Cited 165 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Changes in Abundance of Oral Microbiota Associated with Oral Cancer

Brian L. Schmidt, Justin Kuczynski, Aditi Bhattacharya … (2014)

Individual bacteria and shifts in the composition of the microbiome have been associated with human diseases including cancer. To investigate changes in the microbiome associated with oral cancers, we profiled cancers and anatomically matched contralateral normal tissue from the same patient by sequencing 16S rDNA hypervariable region amplicons. In cancer samples from both a discovery and a subsequent confirmation cohort, abundance of Firmicutes (especially Streptococcus) and Actinobacteria (especially Rothia) was significantly decreased relative to contralateral normal samples from the same patient. Significant decreases in abundance of these phyla were observed for pre-cancers, but not when comparing samples from contralateral sites (tongue and floor of mouth) from healthy individuals. Weighted UniFrac principal coordinates analysis based on 12 taxa separated most cancers from other samples with greatest separation of node positive cases. These studies begin to develop a framework for exploiting the oral microbiome for monitoring oral cancer development, progression and recurrence.

0 comments Cited 156 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Nicole R. Narayan: nicole@secondgenome.com

Thomas Weinmaier: thomas@secondgenome.com

Emilio J. Laserna-Mendieta: ejlaserna@gmail.com

Marcus J. Claesson: M.Claesson@ucc.ie

Fergus Shanahan: f.shanahan@ucc.ie

Karim Dabbagh: karim@secondgenome.com

Shoko Iwai: shoko@secondgenome.com

Todd Z. DeSantis: todd@secondgenome.com

Journal

Journal ID (nlm-ta): BMC Genomics

Journal ID (iso-abbrev): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2164

Publication date (Electronic): 17 January 2020

Publication date PMC-release: 17 January 2020

Publication date Collection: 2020

Volume: 21

Electronic Location Identifier: 56

Affiliations

[1 ]GRID grid.452682.f, Informatics Department, Second Genome Inc., ; South San Francisco, California, USA

[2 ]ISNI 0000000123318773, GRID grid.7872.a, APC Microbiome Ireland, , University College Cork, Co., ; Cork, Ireland

[3 ]ISNI 0000000123318773, GRID grid.7872.a, School of Microbiology, , University College Cork, Co., ; Cork, Ireland

[4 ]ISNI 0000000123318773, GRID grid.7872.a, Department of Medicine, , University College Cork, Co., ; Cork, Ireland

Article

Publisher ID: 6427

DOI: 10.1186/s12864-019-6427-1

PMC ID: 6967091

PubMed ID: 31952477

SO-VID: 147fde1d-52b9-4b4d-b2b3-c1143134da8d

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 12 June 2019

Date accepted : 24 December 2019

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: metagenomics,phylogenetic analysis,sequence alignment,shotgun sequencing,genomic databases

Data availability:

ScienceOpen disciplines: Genetics

Keywords: metagenomics, phylogenetic analysis, sequence alignment, shotgun sequencing, genomic databases

Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Genomic Prediction

Most cited references 13

EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences

Efficient comparative phylogenetics on large trees

Changes in Abundance of Oral Microbiota Associated with Oral Cancer

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 285

Cited by 29

Most referenced authors 827