93
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Bioconductor workflow for microbiome data analysis: from raw reads to community analyses

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or microbial composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: not found
          • Article: not found

          The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            structSSI: Simultaneous and Selective Inference for Grouped or Hierarchically Structured Data

            The 𝖱 package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures. Unlike many multiple testing schemes, these methods specifically incorporate existing information about the grouped or hierarchical dependence between hypotheses under consideration while controlling the false discovery rate. Doing so increases statistical power and interpretability. Furthermore, these procedures provide novel approaches to the central problem of encoding complex dependency between hypotheses. We briefly describe the group Benjamini-Hochberg and hierarchical false discovery rate procedures and then illustrate them using two examples, one a measure of ecological microbial abundances and the other a global temperature time series. For both procedures, we detail the steps associated with the analysis of these particular data sets, including establishing the dependence structures, performing the test, and interpreting the results. These steps are encapsulated by 𝖱 functions, and we explain their applicability to general data sets.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests

                Bookmark

                Author and article information

                Journal
                F1000Res
                F1000Res
                F1000Research
                F1000Research
                F1000Research (London, UK )
                2046-1402
                24 June 2016
                2016
                : 5
                : 1492
                Affiliations
                [1 ]Statistics Department, Stanford University, Stanford, CA, 94305, USA
                [2 ]Whole Biome Inc., San Francisco, CA, 94107, USA
                [1 ]Gladstone Institutes, University of California, San Francisco, San Francisco, CA, USA
                [1 ]Laboratory of Genetically Encoded Small Molecules, The Rockefeller University, New York, NY, USA
                [1 ]Department of Mathematics and Statistics, University of Turku, Turku, Finland
                Author notes

                BJC, KS, JAF, PJM and SPH developed the software tools, BJC, KS, JAF, PJM and SPH developed statistical methods and tested the workflow on the data sets. BJC, KS, JAF, PJM and SPH wrote the article.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Article
                10.12688/f1000research.8986.1
                4955027
                27508062
                47174ac6-bf19-433a-98d9-f5235b72e854
                Copyright: © 2016 Callahan BJ et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 June 2016
                Funding
                Funded by: National Science Foundation
                Award ID: DMS-1162538
                Funded by: National Institutes of Health
                Award ID: R01AI112401
                Award ID: TR32
                This work was partially supported by the NSF (DMS-1162538 to SPH), the NIH (R01AI112401 to SPH), JAF received support from an Stanford Interdisciplinary Graduate Fellowship and KS was supported by an NIH TR32 training grant.
                The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Articles
                Bioinformatics
                Microbial Evolution & Genomics
                Protein Chemistry & Proteomics
                Statistical Methodologies & Health Informatics

                microbiome,taxonomy,community analysis
                microbiome, taxonomy, community analysis

                Comments

                Comment on this article