1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      recount3: summaries and queries for large-scale RNA-seq expression and splicing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. Monorail can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio.

          Supplementary Information

          The online version contains supplementary material available at (10.1186/s13059-021-02533-6).

          Related collections

          Most cited references61

          • Record: found
          • Abstract: found
          • Article: not found

          featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

          Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

            Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.

              Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.
                Bookmark

                Author and article information

                Contributors
                khansen@jhsph.edu
                langmea@cs.jhu.edu
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                29 November 2021
                29 November 2021
                2021
                : 22
                : 323
                Affiliations
                [1 ]GRID grid.21107.35, ISNI 0000 0001 2171 9311, Department of Computer Science, , Johns Hopkins University, ; Baltimore, USA
                [2 ]GRID grid.21107.35, ISNI 0000 0001 2171 9311, Department of Biostatistics, , Johns Hopkins Bloomberg School of Public Health, ; Baltimore, USA
                [3 ]GRID grid.21107.35, ISNI 0000 0001 2171 9311, Johns Hopkins University, ; Baltimore, USA
                [4 ]GRID grid.35403.31, ISNI 0000 0004 1936 9991, Thomas M. Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, ; Urbana, IL USA
                [5 ]GRID grid.21107.35, ISNI 0000 0001 2171 9311, Department of Pathology, , Johns Hopkins University School of Medicine, ; Baltimore, USA
                [6 ]GRID grid.5386.8, ISNI 000000041936877X, Department of Pathology and Laboratory Medicine, , Weill Cornell Medicine, ; New York, NY USA
                [7 ]GRID grid.83440.3b, ISNI 0000000121901201, Institute of Child Health, University College London (UCL), ; London, UK
                [8 ]GRID grid.429552.d, Lieber Institute for Brain Development, ; Baltimore, USA
                [9 ]GRID grid.21107.35, ISNI 0000 0001 2171 9311, Department of Genetic Medicine, , Johns Hopkins School of Medicine, ; Baltimore, USA
                [10 ]GRID grid.21107.35, ISNI 0000 0001 2171 9311, Department of Mental Health, , Johns Hopkins Bloomberg School of Public Health, ; Baltimore, USA
                [11 ]GRID grid.5288.7, ISNI 0000 0000 9758 5690, Department of Biomedical Engineering, , Oregon Health & Science University, ; Portland, OR USA
                [12 ]GRID grid.5288.7, ISNI 0000 0000 9758 5690, Department of Surgery, , Oregon Health & Science University, ; Portland, OR USA
                Author information
                http://orcid.org/0000-0003-2437-1976
                Article
                2533
                10.1186/s13059-021-02533-6
                8628444
                34844637
                83693ced-d3d2-4b77-9035-f963b456b0e1
                © The Author(s) 2021

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 12 July 2021
                : 29 October 2021
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, national institute of general medical sciences;
                Award ID: R01GM121459
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, national institute of general medical sciences;
                Award ID: R01GM118568
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, national institute of general medical sciences;
                Award ID: R35GM139602
                Funded by: FundRef http://dx.doi.org/10.13039/100000105, office of advanced cyberinfrastructure;
                Award ID: ACI-1548562
                Categories
                Database
                Custom metadata
                © The Author(s) 2021

                Genetics
                Genetics

                Comments

                Comment on this article