38
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Comparison of Burrows-Wheeler Transform-Based Mapping Algorithms Used in High-Throughput Whole-Genome Sequencing: Application to Illumina Data for Livestock Genomes 1

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Efficient alignment of the reads onto the reference genome with high accuracy is very important because it determines the global quality of downstream analyses. In this study, we evaluate the performance of three Burrows-Wheeler transform-based mappers, BWA, Bowtie2, and HISAT2, in the context of paired-end Illumina whole-genome sequencing of livestock, using simulated sequence data sets with varying sequence read lengths, insert sizes, and levels of genomic coverage, as well as five real data sets. The mappers were evaluated based on two criteria, computational resource/time requirements and robustness of mapping. Our results show that BWA and Bowtie2 tend to be more robust than HISAT2, while HISAT2 was significantly faster and used less memory than both BWA and Bowtie2. We conclude that there is not a single mapper that is ideal in all scenarios but rather the choice of alignment tool should be driven by the application and sequencing technology.

          Related collections

          Most cited references6

          • Record: found
          • Abstract: found
          • Article: not found

          A survey of sequence alignment algorithms for next-generation sequencing.

          Rapidly evolving sequencing technologies produce data on an unparalleled scale. A central challenge to the analysis of this data is sequence alignment, whereby sequence reads must be compared to a reference. A wide variety of alignment algorithms and software have been subsequently developed over the past two years. In this article, we will systematically review the current development of these algorithms and introduce their practical applications on different types of experimental data. We come to the conclusion that short-read alignment is no longer the bottleneck of data analyses. We also consider future development of alignment algorithms with respect to emerging long sequence reads and the prospect of cloud computing.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Tools for mapping high-throughput sequencing data.

            A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of the generated reads to a reference sequence. To accomplish this task, numerous software tools have been proposed. Determining the mappers that are most suitable for a specific application is not trivial. This survey focuses on classifying mappers through a wide number of characteristics. The goal is to allow practitioners to compare the mappers more easily and find those that are most suitable for their specific problem.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A survey of polymorphisms detected from sequences of popular beef breeds.

              The genome sequence was obtained from 270 sires used in the Germplasm Evaluation (GPE) project. These bulls included 154 purebred AI sires from GPE Cycle VII breeds (Hereford, Angus, Simmental, Limousin, Charolais, Gelbvieh, and Red Angus), 83 F crosses of those breeds, and 33 AI sires from 8 other breeds. The exome capture sequence targeting coding regions of the genome was obtained from 176 of these bulls. Sequence reads were mapped to the UMD 3.1 bovine genome assembly; a mean of 2.5-fold (x) coverage per bull was obtained from the genomic sequence, and the targeted exons were covered at a mean of 20.0x. Over 28.8 million biallelic sequence variants were detected where each allele was present in at least 3 different bulls. These included 22.0 million previously reported variants and 94.1% of the 774,660 autosomal and BTA X SNP on the BovineHD BeadChip assay (HD). More than 92% of the variants detected in targeted exons were also detected from the low-coverage genome sequence. Less than 1% of the variants detected from the combined genome and exome sequence occurred in annotated protein-coding sequences and 5' and 3' untranslated regions (UTR) surrounding the 19,994 annotated protein coding regions. Variation was detected in the coding sequence or UTR of 96.8% of the genes: loss-of-function variants were predicted for 3,298 genes, 14,973 contained nonsynonymous variants, 11,276 had variation in UTR, and 17,721 genes contained synonymous variants. Minor allele frequencies (MAF) were <0.05 for 47.8% of the coding sequence and UTR variants, and MAF distributions were skewed toward low MAF. In contrast, 11.1% of the HD SNP detected in these bulls had MAF < 0.05, and the distribution was skewed toward higher MAF. Genes involved in immune system processes and immune response were overrepresented among those genes containing high MAF loss-of-function and nonsynonymous polymorphisms. Detected variants were submitted to the National Center for Biotechnology Information genetic variation database (dbSNP) under the handle MARC, batch GPE_Bull_GenEx.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Genet
                Front Genet
                Front. Genet.
                Frontiers in Genetics
                Frontiers Media S.A.
                1664-8021
                26 February 2018
                2018
                : 9
                : 35
                Affiliations
                USDA, Agricultural Research Service, U.S. Meat Animal Research Center , Clay Center, NE, United States
                Author notes

                Edited by: Peng Xu, Xiamen University, China

                Reviewed by: Gregor Gorjanc, University of Edinburgh, United Kingdom; Chao Bian, Beijing Genomics Institute (BGI), China

                *Correspondence: Brittney N. Keel brittney.keel@ 123456ars.usda.gov

                This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

                Article
                10.3389/fgene.2018.00035
                5834436
                29535759
                3550873b-cb0e-4736-8603-c1a7b7bafee6
                Copyright © 2018 Keel and Snelling.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 10 October 2017
                : 25 January 2018
                Page count
                Figures: 3, Tables: 5, Equations: 0, References: 12, Pages: 6, Words: 3657
                Categories
                Genetics
                Original Research

                Genetics
                whole-genome sequencing,mapping algorithm,mapper comparison,genomic coverage,livestock
                Genetics
                whole-genome sequencing, mapping algorithm, mapper comparison, genomic coverage, livestock

                Comments

                Comment on this article