29
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.

          ABSTRACT

          Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k-mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity ( N d ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that N d additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes.

          IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: not found

          Genomic mapping by fingerprinting random clones: a mathematical analysis.

          Results from physical mapping projects have recently been reported for the genomes of Escherichia coli, Saccharomyces cerevisiae, and Caenorhabditis elegans, and similar projects are currently being planned for other organisms. In such projects, the physical map is assembled by first "fingerprinting" a large number of clones chosen at random from a recombinant library and then inferring overlaps between clones with sufficiently similar fingerprints. Although the basic approach is the same, there are many possible choices for the fingerprint used to characterize the clones and the rules for declaring overlap. In this paper, we derive simple formulas showing how the progress of a physical mapping project is affected by the nature of the fingerprinting scheme. Using these formulas, we discuss the analytic considerations involved in selecting an appropriate fingerprinting scheme for a particular project.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Predicting the molecular complexity of sequencing libraries

            Predicting the molecular complexity of a genomic sequencing library has emerged as a critical but difficult problem in modern applications of genome sequencing. Available methods to determine either how deeply to sequence, or predict the benefits of additional sequencing, are almost completely lacking. We introduce an empirical Bayesian method to implicitly model any source of bias and accurately characterize the molecular complexity of a DNA sample or library in almost any sequencing application.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample

                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                mSystems
                mSystems
                msys
                msys
                mSystems
                mSystems
                American Society for Microbiology (1752 N St., N.W., Washington, DC )
                2379-5077
                10 April 2018
                May-Jun 2018
                : 3
                : 3
                : e00039-18
                Affiliations
                [a ]School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
                [b ]School of the Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
                [c ]Center for Microbial Ecology, Michigan State University, East Lansing, Michigan, USA
                [d ]Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
                [e ]Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
                University of North Carolina at Charlotte
                Author notes
                Address correspondence to Konstantinos T. Konstantinidis, kostas@ 123456ce.gatech.edu .

                L.M.R. and S.G. contributed equally to this work.

                Citation Rodriguez-R LM, Gunturu S, Tiedje JM, Cole JR, Konstantinidis KT. 2018. Nonpareil 3: fast estimation of metagenomic coverage and sequence diversity. mSystems 3:e00039-18. https://doi.org/10.1128/mSystems.00039-18.

                Author information
                https://orcid.org/0000-0001-7603-3093
                Article
                mSystems00039-18
                10.1128/mSystems.00039-18
                5893860
                29657970
                e55ca69b-e630-4ac2-a1d4-ebe90e51bdd4
                Copyright © 2018 Rodriguez-R et al.

                This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

                History
                : 22 March 2018
                : 23 March 2018
                Page count
                supplementary-material: 9, Figures: 2, Tables: 1, Equations: 0, References: 26, Pages: 9, Words: 7088
                Funding
                Funded by: National Science Foundation (NSF), https://doi.org/10.13039/100000001;
                Award ID: 1356288
                Award Recipient :
                Funded by: Department of Energy, Labor and Economic Growth (DELEG), https://doi.org/10.13039/100004944;
                Award ID: DE-SC0006662
                Award Recipient :
                Funded by: Department of Energy, Labor and Economic Growth (DELEG);
                Award ID: DE-FC02-07ER64494
                Award Recipient :
                Categories
                Research Article
                Novel Systems Biology Techniques
                Custom metadata
                May/June 2018

                bioinformatics,coverage,metagenomics,microbial ecology
                bioinformatics, coverage, metagenomics, microbial ecology

                Comments

                Comment on this article