18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework

      research-article
      * , *
      PLoS Computational Biology
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost’s algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

          , , (2013)
          Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            ART: a next-generation sequencing read simulator.

            ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.

              High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences ("reads") resulting from a sequencing run are first "mapped" (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels). Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                5 October 2016
                October 2016
                : 12
                : 10
                : e1005096
                Affiliations
                [001]Department of Dermatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
                Universite de Montreal, CANADA
                Author notes

                The authors have declared that no competing interests exist.

                • Conceived and designed the experiments: QZ.

                • Performed the experiments: QZ.

                • Analyzed the data: QZ.

                • Contributed reagents/materials/analysis tools: EAG.

                • Wrote the paper: QZ EAG.

                Article
                PCOMPBIOL-D-16-00658
                10.1371/journal.pcbi.1005096
                5051939
                27706155
                9da19d06-1efe-4db5-a5bb-2b56a1cf0bd0
                © 2016 Zheng, Grice

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 21 April 2016
                : 2 August 2016
                Page count
                Figures: 8, Tables: 0, Pages: 20
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Funded by: funder-id http://dx.doi.org/10.13039/100000069, National Institute of Arthritis and Musculoskeletal and Skin Diseases;
                Award ID: R01-AR066663
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000056, National Institute of Nursing Research;
                Award ID: R01-NR015639
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100006920, University of Pennsylvania;
                This work was supported by grants from the National Institutes of Health, National Institutes of Arthritis and Musculoskeletal and Skin Disease (Grant R01 AR066663 to EAG) and the National Institute of Nursing Research (Grant R01 NR015639 to EAG) and the Department of Dermatology at University of Pennsylvania. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Sequencing Techniques
                Sequence Analysis
                Sequence Alignment
                Research and Analysis Methods
                Molecular Biology Techniques
                Sequencing Techniques
                Sequence Analysis
                Sequence Alignment
                Biology and life sciences
                Molecular biology
                Molecular biology techniques
                Sequencing techniques
                DNA sequencing
                Next-Generation Sequencing
                Research and analysis methods
                Molecular biology techniques
                Sequencing techniques
                DNA sequencing
                Next-Generation Sequencing
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Transcriptome Analysis
                Next-Generation Sequencing
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Transcriptome Analysis
                Next-Generation Sequencing
                Biology and Life Sciences
                Genetics
                Gene Types
                Pseudogenes
                Biology and Life Sciences
                Computational Biology
                Genome Complexity
                Pseudogenes
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Complexity
                Pseudogenes
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Biology and Life Sciences
                Genetics
                Gene Expression
                Biology and life sciences
                Molecular biology
                Macromolecular structure analysis
                RNA structure
                RNA alignment
                Biology and life sciences
                Biochemistry
                Nucleic acids
                RNA
                RNA structure
                RNA alignment
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genomic Libraries
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genomic Libraries
                Biology and Life Sciences
                Computational Biology
                Genome Complexity
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Complexity
                Custom metadata
                All relevant data are within the paper and its Supporting Information files.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article