+1 Recommend
1 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Bacteria are everywhere, even in your COI data: Τhe art of getting to know the unknown unknowns and shine light on the dark matter!

      , , , ,

      ARPHA Conference Abstracts

      Pensoft Publishers

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Environmental DNA (eDNA) metabarcoding has been commonly used in recent years (Jeunen et al. 2019) for the identification of the species composition of environmental samples. By making use of genetic markers anchored in conserved gene regions, universally present acrooss the species of large taxonomy groups, eDNA metabarcoding exploits both extra- and intra-cellular DNA fragments for biodiversity assessment.However, there is not a truly “universal” marker gene that is capable of amplifying all species across different taxa (Kress et al. 2015). The mitochondrial cytochrome C oxidase subunit I gene (COI) has many of the desirable properties of a “universal" marker and has been widely used for assessing species identity in Eukaryotes, especially metazoans (Andjar et al. 2018). However, a great number of COI Operational Taxonomic Units (OTUs) or/and Amplicon Sequence Variants (ASVs) retrieved from such studies do not match reference sequences and are often referred to as “dark matter” (Deagle et al. 2014). The aim of this study was to discover the origins and identities of these COI dark matter sequences. We built a reference phylogenetic tree that included as many COI-sequence-related information across the tree of life as possible. An overview of the steps followed is presented in Fig. 1a. Briefly, the Midori reference 2 database was used to retrieve eukaryotes sequences (183,330 species). In addition, the API of the BOLD database was used as source for the corresponding Bacteria (559 genera) and Archaea (41 genera) sequences. Consensus sequences at the family level were constructed from each of these three initial COI datasets. The COI-oriented reference phylogenetic tree of life was then built by using 1,240 consensus sequences with more than 80% of those coming from eukaryotic taxa. Phylogeny-based taxonomic assignment was then used to place query sequences. The a) total number of sequences, b) sequences assigned to Eukaryotes and c) unassigned subsets of OTUs, from marine and freshwater samples, retrieved during in-house metabarcoding experiments, were placed in the reference tree (Fig. 1b). It is clear that a large proportion of sequences targeting the COI region of Eukaryotes actually represents bacterial branches in the phylogenetic tree (Fig. 1b). We conclude that COI metabarcoding studies targeting Eukaryotes may come with a great bias derived from amplification and sequencing of bacterial taxa, depending on the primer pair used. However, for the time being, publicly available bacterial COI sequences are far too few to represent the bacterial variability; thus, a reliable taxonomic identification of them is not possible. We suggest that bacterial COI sequences should be included in the reference databases used for the taxonomy assignment of OTUs/ASVs in COI-based eukaryote metabarcoding studies to allow for bacterial sequences that were amplified to be excluded enabling researchers to exclude non-target sequences. Further, the approach presented here allows researchers to better understand the unknown unknowns and shed light on the dark matter of their metabarcoding sequence data.

          Related collections

          Author and article information

          (View ORCID Profile)
          (View ORCID Profile)
          (View ORCID Profile)
          ARPHA Conference Abstracts
          Pensoft Publishers
          March 04 2021
          March 04 2021
          : 4
          © 2021


          Comment on this article