3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Environmental DNA and metabarcoding allow the identification of a mixture of species and launch a new era in bio- and eco-assessment. Many steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available; each tool's execution parameters need to be tailored to reflect each experiment's idiosyncrasy. Adding to this complexity, the computation capacity of high-performance computing systems is frequently required for such analyses. To address the difficulties, bioinformatic pipelines need to combine state-of-the art technologies and algorithms with an easy to get-set-use framework, allowing researchers to tune each study. Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise programming languages specialized for big data pipelines incorporate features like roll-back checkpoints and on-demand partial pipeline execution.

          Findings

          PEMA is a containerized assembly of key metabarcoding analysis tools that requires low effort in setting up, running, and customizing to researchers’ needs. Based on third-party tools, PEMA performs read pre-processing, (molecular) operational taxonomic unit clustering, amplicon sequence variant inference, and taxonomy assignment for 16S and 18S ribosomal RNA, as well as ITS and COI marker gene data. Owing to its simplified parameterization and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against both mock communities and previously published datasets and achieved results of comparable quality.

          Conclusions

          A high-performance computing–based approach was used to develop PEMA; however, it can be used in personal computers as well. PEMA's time-efficient performance and good results will allow it to be used for accurate environmental DNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: not found

          obitools: a unix-inspired software package for DNA metabarcoding.

          DNA metabarcoding offers new perspectives in biodiversity research. This recently developed approach to ecosystem study relies heavily on the use of next-generation sequencing (NGS) and thus calls upon the ability to deal with huge sequence data sets. The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to set up tailor-made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The obitools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools. A Galaxy wrapper is available on the GenOuest core facility toolshed: http://toolshed.genouest.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Swarm: robust and fast clustering method for amplicon-based studies

            Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies.

              Amplicon-based marker gene surveys form the basis of most microbiome and other microbial community studies. Such PCR-based methods have multiple steps, each of which is susceptible to error and bias. Variance in results has also arisen through the use of multiple methods of next-generation sequencing (NGS) amplicon library preparation. Here we formally characterized errors and biases by comparing different methods of amplicon-based NGS library preparation. Using mock community standards, we analyzed the amplification process to reveal insights into sources of experimental error and bias in amplicon-based microbial community and microbiome experiments. We present a method that improves on the current best practices and enables the detection of taxonomic groups that often go undetected with existing methods.
                Bookmark

                Author and article information

                Journal
                Gigascience
                Gigascience
                gigascience
                GigaScience
                Oxford University Press
                2047-217X
                12 March 2020
                March 2020
                12 March 2020
                : 9
                : 3
                : giaa022
                Affiliations
                [1 ] Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR) , Former U.S. Base of Gournes P.O. Box 2214, 71003, Heraklion, Crete, Greece
                [2 ] Charles University , Department of Ecology, Faculty of Science, Viničná 7, CZ-12844, Prague, Czech Republic
                [3 ] LifeWatch ERIC, Plaza España SN, SECTOR II-III 41013, Seville,Spain
                [4 ] Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology (FORTH) , Foundation for Research and Technology – Hellas, N. Plastira 100, GR-70013, Heraklion, Crete, Greece
                Author notes
                Correspondence address. Haris Zafeiropoulos, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece. E-mail: haris-zaf@ 123456hcmr.gr
                Author information
                http://orcid.org/0000-0002-4405-6802
                Article
                giaa022
                10.1093/gigascience/giaa022
                7066391
                32161947
                5f516a73-a52a-4f14-91af-3a3e346f2691
                © The Author(s) 2020. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 18 November 2019
                : 05 January 2020
                : 14 February 2020
                Page count
                Pages: 12
                Funding
                Funded by: Hellenic Foundation for Research and Innovation, DOI 10.13039/501100013209;
                Award ID: 241
                Funded by: General Secretariat for Research and Technology, DOI 10.13039/501100003448;
                Award ID: 241
                Categories
                Technical Note
                AcademicSubjects/SCI00960
                AcademicSubjects/SCI02254

                pipeline,container,docker,singularity,high performance computing,hpc,edna,metabarcoding

                Comments

                Comment on this article