+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Comparison of Read Mapping and Variant Calling Tools for the Analysis of Plant NGS Data


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          High-throughput sequencing technologies have rapidly developed during the past years and have become an essential tool in plant sciences. However, the analysis of genomic data remains challenging and relies mostly on the performance of automatic pipelines. Frequently applied pipelines involve the alignment of sequence reads against a reference sequence and the identification of sequence variants. Since most benchmarking studies of bioinformatics tools for this purpose have been conducted on human datasets, there is a lack of benchmarking studies in plant sciences. In this study, we evaluated the performance of 50 different variant calling pipelines, including five read mappers and ten variant callers, on six real plant datasets of the model organism Arabidopsis thaliana. Sets of variants were evaluated based on various parameters including sensitivity and specificity. We found that all investigated tools are suitable for analysis of NGS data in plant research. When looking at different performance metrics, BWA-MEM and Novoalign were the best mappers and GATK returned the best results in the variant calling step.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: not found
          • Book: not found

          FastQC: a quality-control tool for high-throughput sequence data.

            • Record: found
            • Abstract: found
            • Article: not found

            Scaling accurate genetic variant discovery to tens of thousands of samples

            Comprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90,000 samples from the Exome Aggregation Consortium (ExAC) that, in contrast to other algorithms, the HC-RCM scales efficiently to very large sample sizes without loss in accuracy; and that the accuracy of indel variant calling is superior in comparison to other algorithms. More importantly, the HC-RCM produces a fully squared-off matrix of genotypes across all samples at every genomic position being investigated. The HC- RCM is a novel, scalable, assembly-based algorithm with abundant applications for population genetics and clinical studies.
              • Record: found
              • Abstract: found
              • Article: not found

              De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits.

              Wild relatives of crops are an important source of genetic diversity for agriculture, but their gene repertoire remains largely unexplored. We report the establishment and analysis of a pan-genome of Glycine soja, the wild relative of cultivated soybean Glycine max, by sequencing and de novo assembly of seven phylogenetically and geographically representative accessions. Intergenomic comparisons identified lineage-specific genes and genes with copy number variation or large-effect mutations, some of which show evidence of positive selection and may contribute to variation of agronomic traits such as biotic resistance, seed composition, flowering and maturity time, organ size and final biomass. Approximately 80% of the pan-genome was present in all seven accessions (core), whereas the rest was dispensable and exhibited greater variation than the core genome, perhaps reflecting a role in adaptation to diverse environments. This work will facilitate the harnessing of untapped genetic diversity from wild soybean for enhancement of elite cultivars.

                Author and article information

                Plants (Basel)
                Plants (Basel)
                02 April 2020
                April 2020
                : 9
                : 4
                : 439
                [1 ]Genetics and Genomics of Plants, CeBiTec and Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
                [2 ]Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany
                [3 ]Molecular Genetics and Physiology of Plants, Faculty of Biology and Biotechnology, Ruhr-University Bochum, 44801 Bochum, Germany
                Author notes
                [* ]Correspondence: hschilbe@ 123456cebitec.uni-bielefeld.de ; Tel.: +49-521-106-8722
                Author information
                © 2020 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                : 15 March 2020
                : 30 March 2020

                single nucleotide variants (snvs),single nucleotide polymorphisms (snps),insertions/deletions (indels),population genomics,re-sequencing,mapper,benchmarking,next generation sequencing (ngs),bioinformatics,plant genomics


                Comment on this article