Bioconductor workflow for microbiome data analysis: from raw reads to community analyses

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or microbial composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.

Related collections

Most cited references 10

Record: found
Abstract: not found
Article: not found

The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses

Mitchell Wold, A-K Ruhe, Johan Wold … (1984)

0 comments Cited 367 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

structSSI: Simultaneous and Selective Inference for Grouped or Hierarchically Structured Data

Kris Sankaran, Susan Holmes (2014)

The 𝖱 package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures. Unlike many multiple testing schemes, these methods specifically incorporate existing information about the grouped or hierarchical dependence between hypotheses under consideration while controlling the false discovery rate. Doing so increases statistical power and interpretability. Furthermore, these procedures provide novel approaches to the central problem of encoding complex dependency between hypotheses. We briefly describe the group Benjamini-Hochberg and hierarchical false discovery rate procedures and then illustrate them using two examples, one a measure of ecological microbial abundances and the other a global temperature time series. For both procedures, we detail the steps associated with the analysis of these particular data sets, including establishing the dependence structures, performing the test, and interpreting the results. These steps are encapsulated by 𝖱 functions, and we explain their applicability to general data sets.

0 comments Cited 110 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests

Jerome Friedman, Lawrence Rafsky (1979)

0 comments Cited 87 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): F1000Res

Journal ID (iso-abbrev): F1000Res

Journal ID (pmc): F1000Research

Title: F1000Research

Publisher: F1000Research (London, UK )

ISSN (Electronic): 2046-1402

Publication date (Electronic): 24 June 2016

Publication date Collection: 2016

Volume: 5

Electronic Location Identifier: 1492

Affiliations

[1 ]Statistics Department, Stanford University, Stanford, CA, 94305, USA

[2 ]Whole Biome Inc., San Francisco, CA, 94107, USA

[1 ]Gladstone Institutes, University of California, San Francisco, San Francisco, CA, USA

[1 ]Laboratory of Genetically Encoded Small Molecules, The Rockefeller University, New York, NY, USA

[1 ]Department of Mathematics and Statistics, University of Turku, Turku, Finland

Author notes

[a ] susan@ 123456stat.stanford.edu

BJC, KS, JAF, PJM and SPH developed the software tools, BJC, KS, JAF, PJM and SPH developed statistical methods and tested the workflow on the data sets. BJC, KS, JAF, PJM and SPH wrote the article.

Competing interests: No competing interests were disclosed.

Article

DOI: 10.12688/f1000research.8986.1

PMC ID: 4955027

PubMed ID: 27508062

SO-VID: 47174ac6-bf19-433a-98d9-f5235b72e854

License:

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 14 June 2016

Funding

Funded by: National Science Foundation

Award ID: DMS-1162538

Funded by: National Institutes of Health

Award ID: R01AI112401

Award ID: TR32

This work was partially supported by the NSF (DMS-1162538 to SPH), the NIH (R01AI112401 to SPH), JAF received support from an Stanford Interdisciplinary Graduate Fellowship and KS was supported by an NIH TR32 training grant.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Bioconductor workflow for microbiome data analysis: from raw reads to community analyses

Read this article at

Abstract

Related collections

Tick microbiome

Most cited references 10

The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses

structSSI: Simultaneous and Selective Inference for Grouped or Hierarchically Structured Data

Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests

Author and article information

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 240

Cited by 161

Most referenced authors 951