Assessment of transcript reconstruction methods for RNA-seq

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

RNA sequencing (RNA-seq) is transforming genome biology, enabling comprehensive transcriptome profiling with unprecendented accuracy and detail. Due to technical limitations of current high-throughput sequencing platforms, transcript identity, structure and expression level must be inferred programmatically from partial sequence reads of fragmented gene products. We evaluated 24 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates, but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations in transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.

Related collections

Most cited references 35

Record: found
Abstract: found
Article: not found

STAR: ultrafast universal RNA-seq aligner.

Alexander Dobin, Carrie A. Davis, Felix Schlesinger … (2013)

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

0 comments Cited 13213 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Bo Li, Colin Dewey (2011)

Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

0 comments Cited 4517 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

Manfred Grabherr, Brian Haas, Moran Yassour … (2011)

Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.

0 comments Cited 3262 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 101215604

Journal ID (pubmed-jr-id): 32338

Journal ID (nlm-ta): Nat Methods

Journal ID (iso-abbrev): Nat. Methods

Title: Nature methods

ISSN (Print): 1548-7091

ISSN (Electronic): 1548-7105

Publication date Nihms-submitted: 18 November 2013

Publication date (Electronic): 03 November 2013

Publication date (Print): December 2013

Publication date PMC-release: 01 June 2014

Volume: 10

Issue: 12

Electronic Location Identifier: 10.1038/nmeth.2714

Affiliations

[1 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK

[2 ]Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain

[3 ]Wellcome Trust Sanger Institute, Cambridge, UK

[5 ]Center for Genomic Regulation, Barcelona, Spain

[6 ]Universitat Pompeu Fabra, Barcelona, Spain

[7 ]Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany

[8 ]Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany

[9 ]Wellcome Trust – Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK

Author notes

[* ]Correspondence: bertone@ 123456ebi.ac.uk

Author contributions JH, RG and TJH conceived and organised the study. Consortium members provided transcript models for evaluation. JH and PB coordinated the analysis, which was carried out by TS, JFA, PGE and FK. TS, PB and PGE wrote the manuscript with input from the other authors.

[4]

RGASP Consortium Josep F Abril ², Martin Akerman ¹¹, Tyler Alioto ¹², Giovanna Ambrosini ^13,14, Stylianos E Antonarakis ¹⁵, Jonas Behr ^16,17, Paul Bertone ^1,7,8,9, Regina Bohnert ¹⁷, Philipp Bucher ^13,14,18, Nicole Cloonan ¹⁹, Thomas Derrien ⁵, Sarah Djebali ⁶, Jiang Du ²⁰, Sandrine Dudoit ²¹, Pär G Engström ¹, Mark Gerstein ^20,22,23, Thomas R Gingeras ¹¹, David Gonzalez ⁵, Sean M Grimmond ¹⁹, Roderic Guigó ^5,6, Lukas Habegger ²³, Jennifer Harrow ³, Tim J Hubbard ³, Christian Iseli ^18,24, Géraldine Jean ¹⁷, André Kahles ^16,17, Felix Kokocinski ³, Julien Lagarde ⁵, Jing Leng ²³, Gregory Lefebvre ^13,18, Suzanna Lewis ²⁵, Ali Mortazavi ²⁶, Peter Niermann ¹⁷, Gunnar Rätsch ^16,17, Alexandre Reymond ²⁷, Paolo Ribeca ¹², Hugues Richard ²⁸, Jacques Rougemont ^13,18, Joel Rozowsky ²², Michael Sammeth ⁵, Andrea Sboner ²², Marcel H Schulz ²⁸, Steven MJ Searle ³, Naryttza Diaz Solorzano ^18,24, Victor Solovyev ²⁹, Mario Stanke ³⁰, Tamara Steijger ¹, Brian Stevenson ^18,24, Heinz Stockinger ^18,24, Armand Valsesia ^18,24, David Weese ³¹, Simon White ³, Barbara J Wold ³², Jie Wu ^11,33, Thomas D Wu ³⁴, Georg Zeller ¹⁷, Daniel Zerbino ¹, Michael Q Zhang ¹¹

¹¹ Cold Spring Harbor Laboratory, New York, USA

¹² Centre Nacional d’Analisi Genomica, Barcelona, Spain

¹³ Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

¹⁴ Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland

¹⁵ Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland

¹⁶ Computational Biology Center, Sloan-Kettering Institute, New York, USA

¹⁷ Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany

¹⁸ Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland

¹⁹ Queensland Centre for Medical Genomics, The University of Queensland, St Lucia, Australia

²⁰ Department of Computer Science, Yale University, Connecticut, USA

²¹ Division of Biostatistics, School of Public Health, University of California, Berkeley, California, USA

²² Department of Molecular Biophysics and Biochemistry, Yale University, Connecticut, USA

²³ Program in Computational Biology and Bioinformatics, Yale University, Connecticut, USA

²⁴ Ludwig Institute for Cancer Research, Lausanne, Switzerland

²⁵ Genomics Division, Lawrence Berkeley National Laboratory, California, USA

²⁶ Department of Developmental and Cell Biology, University of California Irvine, California, USA

²⁷ Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland

²⁸ Max Planck Institute for Molecular Genetics, Berlin, Germany

²⁹ Department of Computer Science, Royal Holloway, University of London, London, UK

³⁰ Institute for Microbiology and Genetics, Göttingen, Germany

³¹ Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany

³² Biology Division, California Institute of Technology, Pasadena, California, USA

³³ Department of Applied Mathematics and Statistics, Stony Brook University, New York, USA

³⁴ Bioinformatics and Computational Biology, Genentech, Inc., San Francisco, California, USA

Article

Manuscript ID: EMS55606

DOI: 10.1038/nmeth.2714

PMC ID: 3851240

PubMed ID: 24185837

SO-VID: 143f3dcd-7c48-4eaf-8f5e-5dad476dc850

License:

Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

History

Funding

Funded by: Wellcome Trust :

Award ID: 077198 || WT

Funded by: Wellcome Trust :

Award ID: 062023 || WT

Comments

Comment on this article

scite_

Cited by 313

See all cited by

- Version 1

Assessment of transcript reconstruction methods for RNA-seq

Read this article at

Abstract

Related collections

Plant MYBs

Most cited references 35

STAR: ultrafast universal RNA-seq aligner.

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

Author and article information

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 210

Cited by 313