203
views
0
recommends
+1 Recommend
1 collections
    2
    shares
      • Record: found
      • Abstract: found
      • Article: found

      RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis

      Read this article at

      ScienceOpenPublisher
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

          Related collections

          Most cited references115

          • Record: found
          • Abstract: found
          • Article: not found

          The transcriptional landscape of the yeast genome defined by RNA sequencing.

          The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences

            Abstract Motivation In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical and biological variability in the observations. A common task is to accurately estimate the effect size, often in terms of a logarithmic fold change (LFC). Results When the read counts are low or highly variable, the maximum likelihood estimates for the LFCs has high variance, leading to large estimates not representative of true differences, and poor ranking of genes by effect size. One approach is to introduce filtering thresholds and pseudocounts to exclude or moderate estimated LFCs. Filtering may result in a loss of genes from the analysis with true differences in expression, while pseudocounts provide a limited solution that must be adapted per dataset. Here, we propose the use of a heavy-tailed Cauchy prior distribution for effect sizes, which avoids the use of filter thresholds or pseudocounts. The proposed method, Approximate Posterior Estimation for generalized linear model, apeglm, has lower bias than previously proposed shrinkage estimators, while still reducing variance for those genes with little information for statistical inference. Availability and implementation The apeglm package is available as an R/Bioconductor package at https://bioconductor.org/packages/apeglm, and the methods can be called from within the DESeq2 software. Supplementary information Supplementary data are available at Bioinformatics online.
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Improving RNA-Seq expression estimates by correcting for fragment bias

              The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies.

                Author and article information

                Journal
                Annual Review of Biomedical Data Science
                Annu. Rev. Biomed. Data Sci.
                Annual Reviews
                2574-3414
                2574-3414
                July 20 2019
                July 20 2019
                : 2
                : 1
                : 139-173
                Affiliations
                [1 ]Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
                [2 ]Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland;
                [3 ]Department of Biostatistics and Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514, USA
                [4 ]Department of Computer Science, Stony Brook University, Stony Brook, New York 11794, USA
                Article
                10.1146/annurev-biodatasci-072018-021255
                482a275c-4465-47c7-a33a-9c2467a1e810
                © 2019
                History

                Computational chemistry & Modeling,Medicine,Biochemistry,Biomedical engineering,Medical physics

                Comments

                Comment on this article

                Related Documents Log