• Record: found
  • Abstract: found
  • Article: not found

Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.

      Related collections

      Most cited references 35

      • Record: found
      • Abstract: found
      • Article: found

      TopHat: discovering splice junctions with RNA-Seq

      Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
        • Record: found
        • Abstract: found
        • Article: not found

        Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms

        High-throughput mRNA sequencing (RNA-Seq) holds the promise of simultaneous transcript discovery and abundance estimation 1-3 . We introduce an algorithm for transcript assembly coupled with a statistical model for RNA-Seq experiments that produces estimates of abundances. Our algorithms are implemented in an open source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed more than 430 million paired 75bp RNA-Seq reads from a mouse myoblast cell line representing a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Analysis of transcript expression over the time series revealed complete switches in the dominant transcription start site (TSS) or splice-isoform in 330 genes, along with more subtle shifts in a further 1,304 genes. These dynamics suggest substantial regulatory flexibility and complexity in this well-studied model of muscle development.
          • Record: found
          • Abstract: found
          • Article: not found

          Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

          We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

            Author and article information

            [1 ] Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge MA, 02142, USA
            [2 ] School of Computer Science, Hebrew University, Jerusalem, 91904, Israel
            [3 ] Department of Biology, Massachusetts Institute of Technology, Cambridge MA, USA
            [4 ] Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester MA 01605, USA
            [5 ] Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University
            [6 ] Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, 91904, Israel
            [7 ] Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140
            Author notes
            Correspondence and requests for materials should be addressed to nir@ (NF), aregev@ (AR)

            These authors contributed equally to this work and appear in alphabetical order


            These authors contributed equally to this work

            Nat Biotechnol
            Nat. Biotechnol.
            Nature biotechnology
            29 April 2011
            15 May 2011
            13 February 2013
            : 29
            : 7
            : 644-652

            Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:

            Funded by: National Human Genome Research Institute : NHGRI
            Award ID: U54 HG003067-06 || HG
            Funded by: Office of the Director : NIH
            Award ID: DP1 OD003958-03 || OD
            Funded by: Howard Hughes Medical Institute :
            Award ID: || HHMI_



            Comment on this article