Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.

Related collections

Most cited references 35

Record: found
Abstract: found
Article: not found

Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms

Cole Trapnell, Brian A Williams, Geo M Pertea … (2013)

High-throughput mRNA sequencing (RNA-Seq) holds the promise of simultaneous transcript discovery and abundance estimation 1-3 . We introduce an algorithm for transcript assembly coupled with a statistical model for RNA-Seq experiments that produces estimates of abundances. Our algorithms are implemented in an open source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed more than 430 million paired 75bp RNA-Seq reads from a mouse myoblast cell line representing a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Analysis of transcript expression over the time series revealed complete switches in the dominant transcription start site (TSS) or splice-isoform in 330 genes, along with more subtle shifts in a further 1,304 genes. These dynamics suggest substantial regulatory flexibility and complexity in this well-studied model of muscle development.

0 comments Cited 2452 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

TopHat: discovering splice junctions with RNA-Seq

Cole Trapnell, Lior Pachter, Steven Salzberg (2011)

Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: cole@cs.umd.edu Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 1456 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Daniel Zerbino, Ewan Birney (2008)

We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

0 comments Cited 898 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9604648

Journal ID (pubmed-jr-id): 20305

Journal ID (nlm-ta): Nat Biotechnol

Journal ID (iso-abbrev): Nat. Biotechnol.

Title: Nature biotechnology

ISSN (Print): 1087-0156

ISSN (Electronic): 1546-1696

Publication date Nihms-submitted: 29 April 2011

Publication date (Electronic): 15 May 2011

Publication date PMC-release: 13 February 2013

Volume: 29

Issue: 7

Pages: 644-652

Affiliations

[1 ] Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge MA, 02142, USA

[2 ] School of Computer Science, Hebrew University, Jerusalem, 91904, Israel

[3 ] Department of Biology, Massachusetts Institute of Technology, Cambridge MA, USA

[4 ] Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester MA 01605, USA

[5 ] Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University

[6 ] Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, 91904, Israel

[7 ] Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140

Author notes

Correspondence and requests for materials should be addressed to nir@ 123456cs.huji.ac.il (NF), aregev@ 123456broad.mit.edu (AR)

[*]

These authors contributed equally to this work and appear in alphabetical order

[‡]

These authors contributed equally to this work

Article

Manuscript ID: NIHMS292662

DOI: 10.1038/nbt.1883

PMC ID: 3571712

PubMed ID: 21572440

SO-VID: e4d5fe74-1984-48dc-b0db-4cffb73e2f0e

License:

Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

History

Funding

Funded by: National Human Genome Research Institute : NHGRI

Award ID: U54 HG003067-06 || HG

Funded by: Office of the Director : NIH

Award ID: DP1 OD003958-03 || OD

Funded by: Howard Hughes Medical Institute :

Award ID: || HHMI_

Comments

Comment on this article

scite_

Cited by 7,669

See all cited by

Most referenced authors 1,033

See all reference authors

- Version 1
- Version 1

Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 35

Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms

TopHat: discovering splice junctions with RNA-Seq

Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Author and article information

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 134

Cited by 7,669

Most referenced authors 1,033