13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The recent blooming of metabarcoding applications to biodiversity studies comes with some relevant methodological debates. One such issue concerns the treatment of reads by denoising or by clustering methods, which have been wrongly presented as alternatives. It has also been suggested that denoised sequence variants should replace clusters as the basic unit of metabarcoding analyses, missing the fact that sequence clusters are a proxy for species-level entities, the basic unit in biodiversity studies. We argue here that methods developed and tested for ribosomal markers have been uncritically applied to highly variable markers such as cytochrome oxidase I (COI) without conceptual or operational (e.g., parameter setting) adjustment. COI has a naturally high intraspecies variability that should be assessed and reported, as it is a source of highly valuable information. We contend that denoising and clustering are not alternatives. Rather, they are complementary and both should be used together in COI metabarcoding pipelines.

          Results

          Using a COI dataset from benthic marine communities, we compared two denoising procedures (based on the UNOISE3 and the DADA2 algorithms), set suitable parameters for denoising and clustering, and applied these steps in different orders. Our results indicated that the UNOISE3 algorithm preserved a higher intra-cluster variability. We introduce the program DnoisE to implement the UNOISE3 algorithm taking into account the natural variability (measured as entropy) of each codon position in protein-coding genes.  This correction increased the number of sequences retained by 88%. The order of the steps (denoising and clustering) had little influence on the final outcome.

          Conclusions

          We highlight the need for combining denoising and clustering, with adequate choice of stringency parameters, in COI metabarcoding. We present a program that uses the coding properties of this marker to improve the denoising step. We recommend researchers to report their results in terms of both denoised sequences (a proxy for haplotypes) and clusters formed (a proxy for species), and to avoid collapsing the sequences of the latter into a single representative. This will allow studies at the cluster (ideally equating species-level diversity) and at the intra-cluster level, and will ease additivity and comparability between studies.

          Supplementary information

          The online version contains supplementary material available at 10.1186/s12859-021-04115-6.

          Related collections

          Most cited references55

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            DADA2: High resolution sample inference from Illumina amplicon data

            We present DADA2, a software package that models and corrects Illumina-sequenced amplicon errors. DADA2 infers sample sequences exactly, without coarse-graining into OTUs, and resolves differences of as little as one nucleotide. In several mock communities DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2

                Bookmark

                Author and article information

                Contributors
                owen.wangensteen@uit.no
                xturon@ceab.csic.es
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                5 April 2021
                5 April 2021
                2021
                : 22
                : 177
                Affiliations
                [1 ]GRID grid.423563.5, ISNI 0000 0001 0159 2034, Department of Marine Ecology, , Centre for Advanced Studies of Blanes (CEAB-CSIC), ; Blanes (Girona), Catalonia Spain
                [2 ]GRID grid.5841.8, ISNI 0000 0004 1937 0247, Department of Evolutionary Biology, Ecology and Environmental Sciences, , University of Barcelona and Research Institute of Biodiversity (IRBIO), ; Barcelona, Catalonia Spain
                [3 ]GRID grid.10919.30, ISNI 0000000122595234, Norwegian College of Fishery Science, , UiT The Arctic University of Norway, ; Tromsö, Norway
                Article
                4115
                10.1186/s12859-021-04115-6
                8020537
                33820526
                432fffbd-fbc8-400d-9441-97473962498f
                © The Author(s) 2021

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 9 January 2021
                : 30 March 2021
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2021

                Bioinformatics & Computational biology
                metabarcoding,metaphylogeography,coi,denoising,clustering,operational taxonomic units

                Comments

                Comment on this article