43
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Magic-BLAST, an accurate RNA-seq aligner for long and short reads

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Next-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline.

          Results

          Magic-BLAST uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome.

          Conclusions

          We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI.

          Electronic supplementary material

          The online version of this article (10.1186/s12859-019-2996-x) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis

          RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Simulation-based comprehensive benchmarking of RNA-seq aligners

            Benchmarking on synthetic data reveals differences between common RNA-seq alignment software tools, particularly for complex genomic regions.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching.

              In this paper, we borrow the idea of the receiver operating characteristic (ROC) from clinical medicine and demonstrate its application to sequence comparison. The ROC includes elements of both sensitivity and specificity, and is a quantitative measure of the usefulness of a diagnostic. The ROC is used in this work to investigate the effects of scoring table and gap penalties on database searches. Studies on three families of proteins, 4Fe-4S ferredoxins, lysR bacterial regulatory proteins, and bacterial RNA polymerase sigma-factors lead to the following conclusions: sequence families are quite idiosyncratic, but the best PAM distance for database searches using the Smith-Waterman method is somewhat larger than predicted by theoretical methods, about 200 PAM. The length independent gap penalty (gap initiation penalty) is quite important, but shows a broad peak at values of about 20-24. The length dependent gap penalty (gap extension penalty) is almost irrelevant suggesting that successful database searches rely only to a limited degree on gapped alignments. Taken together, these observations lead to the conclusion that the optimal conditions for alignments and database searches are not, and should not be expected to be, the same.
                Bookmark

                Author and article information

                Contributors
                boratyng@ncbi.nlm.nih.gov
                mieg@ncbi.nlm.nih.gov
                Danielle.Thierry-Mieg@nih.gov
                busbybr@ncbi.nlm.nih.gov
                madden@ncbi.nlm.nih.gov
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                25 July 2019
                25 July 2019
                2019
                : 20
                : 405
                Affiliations
                ISNI 0000 0001 2297 5165, GRID grid.94365.3d, National Center for Biotechnology Information, National Library of Medicine, , National Institutes of Health, ; 8600 Rockville Pike, Bethesda, MD 20894 USA
                Author information
                http://orcid.org/0000-0003-1641-7321
                Article
                2996
                10.1186/s12859-019-2996-x
                6659269
                31345161
                c56e0fbe-7011-4060-932a-9e8485f711a4
                © The Author(s). 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 21 March 2019
                : 16 July 2019
                Funding
                Funded by: Intramural Research Program of the NIH, National Library of Medicine
                Categories
                Software
                Custom metadata
                © The Author(s) 2019

                Bioinformatics & Computational biology
                rna-seq,blast,alignment
                Bioinformatics & Computational biology
                rna-seq, blast, alignment

                Comments

                Comment on this article