29
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Estimating intraspecific genetic diversity from community DNA metabarcoding data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high-throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTUs), losing information on intraspecific diversity in the process. While Cytochrome c oxidase subunit I (COI) haplotype information is limited in resolving intraspecific diversity it is nevertheless often useful e.g. in a phylogeographic context, helping to formulate hypotheses on taxon distribution and dispersal.

          Methods

          This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotype information from freshwater macroinvertebrate metabarcoding datasets. This novel approach was added to the R package “JAMP” and can be applied to COI amplicon datasets. We tested our haplotyping method by sequencing (i) a single-species mock community composed of 31 individuals with 15 different haplotypes spanning three orders of magnitude in biomass and (ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates.

          Results

          We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177–200 OTUs, each containing an average of 2.40–3.30 haplotypes per OTU. The derived intraspecific diversity data showed population structures that were consistent between replicates and similar between primer pairs but resolution depended on the primer length. A closer look at abundant taxa in the dataset revealed various population genetic patterns, e.g. the stonefly Taeniopteryx nebulosa and the caddisfly Hydropsyche pellucidula showed a distinct north–south cline with respect to haplotype distribution, while the beetle Oulimnius tuberculatus and the isopod Asellus aquaticus displayed no clear population pattern but differed in genetic diversity.

          Discussion

          We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate metabarcoding data. It needs to be stressed that at this point this metabarcoding-informed haplotyping is not capable of capturing the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding datasets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about species diversity but also underlying genetic diversity.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species

          The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. To date, high capacity, low cost genotyping has been largely achieved via “SNP chip” microarray-based platforms which require substantial prior knowledge of both genome sequence and variability, and once designed are suitable only for those targeted variable nucleotide sites. This method introduces substantial ascertainment bias and inherently precludes detection of rare or population-specific variants, a major source of information for both population history and genotype-phenotype association. Recent developments in reduced-representation genome sequencing experiments on massively parallel sequencers (commonly referred to as RAD-tag or RADseq) have brought direct sequencing to the problem of population genotyping, but increased cost and procedural and analytical complexity have limited their widespread adoption. Here, we describe a complete laboratory protocol, including a custom combinatorial indexing method, and accompanying software tools to facilitate genotyping across large numbers (hundreds or more) of individuals for a range of markers (hundreds to hundreds of thousands). Our method requires no prior genomic knowledge and achieves per-site and per-individual costs below that of current SNP chip technology, while requiring similar hands-on time investment, comparable amounts of input DNA, and downstream analysis times on the order of hours. Finally, we provide empirical results from the application of this method to both genotyping in a laboratory cross and in wild populations. Because of its flexibility, this modified RADseq approach promises to be applicable to a diversity of biological questions in a wide range of organisms.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform

            With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Mitochondrial pseudogenes: evolution's misplaced witnesses.

              Nuclear copies of mitochondrial DNA (mtDNA) have contaminated PCR-based mitochondrial studies of over 64 different animal species. Since the last review of these nuclear mitochondrial pseudogenes (Numts) in animals, Numts have been found in 53 of the species studied. The recent evidence suggests that Numts are not equally abundant in all species, for example they are more common in plants than in animals, and also more numerous in humans than in Drosophila. Methods for avoiding Numts have now been tested, and several recent studies demonstrate the potential utility of Numt DNA sequences in evolutionary studies. As relics of ancient mtDNA, these pseudogenes can be used to infer ancestral states or root mitochondrial phylogenies. Where they are numerous and selectively unconstrained, Numts are ideal for the study of spontaneous mutation in nuclear genomes.
                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ Inc. (San Francisco, USA )
                2167-8359
                9 April 2018
                2018
                : 6
                : e4644
                Affiliations
                [1 ]Aquatic Ecosystem Research, University of Duisburg-Essen , Essen, North Rhine-Westphalia, Germany
                [2 ]Centre for Biodiversity Genomics, University of Guelph , Guelph, ON, Canada
                [3 ]Centre for Water and Environmental Research (ZWU) Essen, University of Duisburg-Essen , Essen, North Rhine-Westphalia, Germany
                Author information
                http://orcid.org/0000-0003-4672-7099
                http://orcid.org/0000-0002-8992-575X
                http://orcid.org/0000-0002-5465-913X
                Article
                4644
                10.7717/peerj.4644
                5896493
                29666773
                b28d186b-77eb-43f9-85e0-5c779c4b799f
                © 2018 Elbrecht et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

                History
                : 10 February 2018
                : 28 March 2018
                Funding
                Funded by: European Cooperation in Science and Technology (COST) Action DNAqua-Net
                Award ID: CA15219
                Funded by: Canada First Research Excellence Fund
                Funded by: Bodnarescu Foundation
                This study is part of the European Cooperation in Science and Technology (COST) Action DNAqua-Net (CA15219). Dirk Steinke and Vasco Elbrecht were supported by the Canada First Research Excellence Fund for the Food from Thought initiative. Ecaterina Edith Vamos was supported by a grant of the Bodnarescu Foundation (Deutsches Stiftungszentrum). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Biogeography
                Bioinformatics
                Molecular Biology
                Freshwater Biology

                metabarcoding,high-throughput sequencing,population genetics,haplotyping,ecosystem assessment,exact sequence variant,co1

                Comments

                Comment on this article