127
views
0
recommends
+1 Recommend
0 collections
    1
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Systematic evaluation of spliced alignment programs for RNA-seq data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. to assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. in total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: found
          • Article: not found

          Assemblathon 1: a competitive assessment of de novo short read assembly methods.

          Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).

            A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously. We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription-polymerase chain reaction (RT-PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (http://cbil.upenn.edu/RUM). ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Tools for mapping high-throughput sequencing data.

              A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of the generated reads to a reference sequence. To accomplish this task, numerous software tools have been proposed. Determining the mappers that are most suitable for a specific application is not trivial. This survey focuses on classifying mappers through a wide number of characteristics. The goal is to allow practitioners to compare the mappers more easily and find those that are most suitable for their specific problem.
                Bookmark

                Author and article information

                Journal
                101215604
                32338
                Nat Methods
                Nat. Methods
                Nature methods
                1548-7091
                1548-7105
                14 April 2014
                03 November 2013
                December 2013
                13 May 2014
                : 10
                : 12
                : 1185-1191
                Affiliations
                [1 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
                [2 ]Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
                [3 ]Penn Center for Bioinformatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
                [4 ]Computational Biology Center, Sloan-Kettering Institute, New York, New York, USA
                [5 ]Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
                [7 ]Wellcome Trust Sanger Institute, Cambridge, UK
                [8 ]Centre for Genomic Regulation, Barcelona, Spain
                [9 ]Universitat Pompeu Fabra, Barcelona, Spain
                [10 ]Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
                [11 ]Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
                [12 ]Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
                Author notes
                Correspondence should be addressed to P.B. ( bertone@ 123456ebi.ac.uk )
                [13]

                Present address: Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden.

                [6]

                Full lists of members and affiliations appear at the end of the paper.

                AUTHOR CONTRIBUTIONS: P.B., R.G., J.H., T.J.H. and N.G. conceived of and organized the study. G.R.G. and B.S. created the simulated RNA-seq data. Consortium members provided alignments for evaluation. P.G.E., T.S., B.S. and G.R.G. analyzed the data. P.G.E. and P.B. coordinated the analysis and wrote the paper with input from the aforementioned authors. A.K. and G.R. carried out preliminary analysis and metric development based on earlier RNA-seq and alignment data but did not evaluate the alignments described herein.

                Article
                EMS58004
                10.1038/nmeth.2722
                4018468
                24185836
                587745a4-d38d-4b08-bedb-c065eb12c559
                © 2013 Nature America, Inc. All rights reserved.

                This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.

                History
                Categories
                Article

                Life sciences
                Life sciences

                Comments

                Comment on this article