+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Transcriptome assembly from long-read RNA-seq alignments with StringTie2


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms

          We introduce Sailfish, a computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Because Sailfish entirely avoids mapping reads, a time-consuming step in all current methods, it provides quantification estimates much faster than do existing approaches (typically 20 times faster) without loss of accuracy. By facilitating frequent reanalysis of data and reducing the need to optimize parameters, Sailfish exemplifies the potential of lightweight algorithms for efficiently processing sequencing reads.
            • Record: found
            • Abstract: found
            • Article: not found

            Nanopore native RNA sequencing of a human poly(A) transcriptome

            High throughput cDNA sequencing technologies have advanced our understanding of transcriptome complexity and regulation. However, these methods lose information contained in biological RNA because the copied reads are often short and because modifications are not retained. We address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies (ONT). Our study generated 9.9 million aligned sequence reads for the human cell line GM12878, using thirty MinION flow cells at six institutions. These native RNA reads had a median length of 771 bases, and a maximum aligned length of over 21,000 bases. Mitochondrial poly(A) reads provided an internal measure of read length quality. We combined these long nanopore reads with higher accuracy short-reads and annotated GM12878 promoter regions, to identify 33,984 plausible RNA isoforms. We describe strategies for assessing 3′ poly(A) tail length, base modifications, and transcript haplotypes.
              • Record: found
              • Abstract: found
              • Article: not found

              Improved data analysis for the MinION nanopore sequencer

              The Oxford Nanopore MinION sequences individual DNA molecules using an array of pores that read nucleotide identities based on ionic current steps. We evaluated and optimized MinION performance using M13 genomic dsDNA. Using expectation-maximization (EM) we obtained robust maximum likelihood (ML) estimates for read insertion, deletion and substitution error rates (4.9%, 7.8%, and 5.1% respectively). We found that 99% of high-quality ‘2D’ MinION reads mapped to reference at a mean identity of 85%. We present a MinION-tailored tool for single nucleotide variant (SNV) detection that uses ML parameter estimates and marginalization over many possible read alignments to achieve precision and recall of up to 99%. By pairing our high-confidence alignment strategy with long MinION reads, we resolved the copy number for a cancer/testis gene family (CT47) within an unresolved region of human chromosome Xq24.

                Author and article information

                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                16 December 2019
                16 December 2019
                : 20
                : 278
                [1 ]ISNI 0000 0001 2171 9311, GRID grid.21107.35, Department of Computer Science, , Johns Hopkins University, ; Baltimore, MD 21218 USA
                [2 ]ISNI 0000 0001 2171 9311, GRID grid.21107.35, Center for Computational Biology, Whiting School of Engineering, , Johns Hopkins University, ; Baltimore, MD 21205 USA
                [3 ]ISNI 0000 0001 2171 9311, GRID grid.21107.35, Department of Biomedical Engineering, , Johns Hopkins University, ; Baltimore, MD 21218 USA
                [4 ]ISNI 0000 0001 2171 9311, GRID grid.21107.35, Department of Biostatistics, Bloomberg School of Public Health, , Johns Hopkins University, ; Baltimore, MD 21205 USA
                Author information
                © The Author(s). 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                : 10 September 2019
                : 2 December 2019
                Funded by: FundRef http://dx.doi.org/10.13039/100000153, Division of Biological Infrastructure;
                Award ID: 1458178
                Award ID: 1759518
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01-HG006677
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;
                Award ID: R35GM13051
                Custom metadata
                © The Author(s) 2019

                transcriptome assembly,rna-seq,long-read sequencing,gene expression
                transcriptome assembly, rna-seq, long-read sequencing, gene expression


                Comment on this article