232
views
0
recommends
+1 Recommend
0 collections
    13
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Assessment of transcript reconstruction methods for RNA-seq

      research-article
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          RNA sequencing (RNA-seq) is transforming genome biology, enabling comprehensive transcriptome profiling with unprecendented accuracy and detail. Due to technical limitations of current high-throughput sequencing platforms, transcript identity, structure and expression level must be inferred programmatically from partial sequence reads of fragmented gene products. We evaluated 24 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates, but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations in transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: found
          • Article: not found

          STAR: ultrafast universal RNA-seq aligner.

          Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

            Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

              Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.
                Bookmark

                Author and article information

                Journal
                101215604
                32338
                Nat Methods
                Nat. Methods
                Nature methods
                1548-7091
                1548-7105
                18 November 2013
                03 November 2013
                December 2013
                01 June 2014
                : 10
                : 12
                : 10.1038/nmeth.2714
                Affiliations
                [1 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
                [2 ]Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
                [3 ]Wellcome Trust Sanger Institute, Cambridge, UK
                [5 ]Center for Genomic Regulation, Barcelona, Spain
                [6 ]Universitat Pompeu Fabra, Barcelona, Spain
                [7 ]Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
                [8 ]Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
                [9 ]Wellcome Trust – Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
                Author notes
                [* ]Correspondence: bertone@ 123456ebi.ac.uk

                Author contributions JH, RG and TJH conceived and organised the study. Consortium members provided transcript models for evaluation. JH and PB coordinated the analysis, which was carried out by TS, JFA, PGE and FK. TS, PB and PGE wrote the manuscript with input from the other authors.

                [4]

                RGASP Consortium Josep F Abril 2, Martin Akerman 11, Tyler Alioto 12, Giovanna Ambrosini 13,14, Stylianos E Antonarakis 15, Jonas Behr 16,17, Paul Bertone 1,7,8,9, Regina Bohnert 17, Philipp Bucher 13,14,18, Nicole Cloonan 19, Thomas Derrien 5, Sarah Djebali 6, Jiang Du 20, Sandrine Dudoit 21, Pär G Engström 1, Mark Gerstein 20,22,23, Thomas R Gingeras 11, David Gonzalez 5, Sean M Grimmond 19, Roderic Guigó 5,6, Lukas Habegger 23, Jennifer Harrow 3, Tim J Hubbard 3, Christian Iseli 18,24, Géraldine Jean 17, André Kahles 16,17, Felix Kokocinski 3, Julien Lagarde 5, Jing Leng 23, Gregory Lefebvre 13,18, Suzanna Lewis 25, Ali Mortazavi 26, Peter Niermann 17, Gunnar Rätsch 16,17, Alexandre Reymond 27, Paolo Ribeca 12, Hugues Richard 28, Jacques Rougemont 13,18, Joel Rozowsky 22, Michael Sammeth 5, Andrea Sboner 22, Marcel H Schulz 28, Steven MJ Searle 3, Naryttza Diaz Solorzano 18,24, Victor Solovyev 29, Mario Stanke 30, Tamara Steijger 1, Brian Stevenson 18,24, Heinz Stockinger 18,24, Armand Valsesia 18,24, David Weese 31, Simon White 3, Barbara J Wold 32, Jie Wu 11,33, Thomas D Wu 34, Georg Zeller 17, Daniel Zerbino 1, Michael Q Zhang 11

                11 Cold Spring Harbor Laboratory, New York, USA

                12 Centre Nacional d’Analisi Genomica, Barcelona, Spain

                13 Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

                14 Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland

                15 Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland

                16 Computational Biology Center, Sloan-Kettering Institute, New York, USA

                17 Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany

                18 Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland

                19 Queensland Centre for Medical Genomics, The University of Queensland, St Lucia, Australia

                20 Department of Computer Science, Yale University, Connecticut, USA

                21 Division of Biostatistics, School of Public Health, University of California, Berkeley, California, USA

                22 Department of Molecular Biophysics and Biochemistry, Yale University, Connecticut, USA

                23 Program in Computational Biology and Bioinformatics, Yale University, Connecticut, USA

                24 Ludwig Institute for Cancer Research, Lausanne, Switzerland

                25 Genomics Division, Lawrence Berkeley National Laboratory, California, USA

                26 Department of Developmental and Cell Biology, University of California Irvine, California, USA

                27 Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland

                28 Max Planck Institute for Molecular Genetics, Berlin, Germany

                29 Department of Computer Science, Royal Holloway, University of London, London, UK

                30 Institute for Microbiology and Genetics, Göttingen, Germany

                31 Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany

                32 Biology Division, California Institute of Technology, Pasadena, California, USA

                33 Department of Applied Mathematics and Statistics, Stony Brook University, New York, USA

                34 Bioinformatics and Computational Biology, Genentech, Inc., San Francisco, California, USA

                Article
                EMS55606
                10.1038/nmeth.2714
                3851240
                24185837
                143f3dcd-7c48-4eaf-8f5e-5dad476dc850

                Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

                History
                Funding
                Funded by: Wellcome Trust :
                Award ID: 077198 || WT
                Funded by: Wellcome Trust :
                Award ID: 062023 || WT
                Categories
                Article

                Life sciences
                Life sciences

                Comments

                Comment on this article