193
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

          RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Streaming fragment assignment for real-time analysis of sequencing experiments

            We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Computational methods for transcriptome annotation and quantification using RNA-seq.

              High-throughput RNA sequencing (RNA-seq) promises a comprehensive picture of the transcriptome, allowing for the complete annotation and quantification of all genes and their isoforms across samples. Realizing this promise requires increasingly complex computational methods. These computational challenges fall into three main categories: (i) read mapping, (ii) transcriptome reconstruction and (iii) expression quantification. Here we explain the major conceptual and practical challenges, and the general classes of solutions for each category. Finally, we highlight the interdependence between these categories and discuss the benefits for different biological applications.
                Bookmark

                Author and article information

                Journal
                Nat Biotechnol
                Nature biotechnology
                Springer Science and Business Media LLC
                1546-1696
                1087-0156
                Mar 2015
                : 33
                : 3
                Affiliations
                [1 ] 1] Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, USA. [2] McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland, USA.
                [2 ] 1] Department of Molecular Biology, The University of Texas Southwestern Medical Center, Dallas, Texas, USA. [2] Center for Regenerative Science and Medicine, The University of Texas Southwestern Medical Center, Dallas, Texas, USA.
                [3 ] 1] Department of Molecular Biology, The University of Texas Southwestern Medical Center, Dallas, Texas, USA. [2] Center for Regenerative Science and Medicine, The University of Texas Southwestern Medical Center, Dallas, Texas, USA. [3] Simmons Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas, USA.
                [4 ] 1] Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, USA. [2] McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland, USA. [3] Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA. [4] Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.
                Article
                nbt.3122 NIHMS736717
                10.1038/nbt.3122
                4643835
                25690850
                e5d3fa69-c994-43ec-a736-d77bca28d54e
                History

                Comments

                Comment on this article