RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Related collections

Most cited references 115

Record: found
Abstract: found
Article: not found

The transcriptional landscape of the yeast genome defined by RNA sequencing.

U Nagalakshmi, Z. Wang, K. Waern … (2008)

The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.

0 comments Cited 962 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Improving RNA-Seq expression estimates by correcting for fragment bias

Adam Roberts, Cole Trapnell, Julie Donaghey … (2011)

The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies.

0 comments Cited 616 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences

Anqi Zhu, Joseph Ibrahim, Michael Love (2018)

Abstract Motivation In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical and biological variability in the observations. A common task is to accurately estimate the effect size, often in terms of a logarithmic fold change (LFC). Results When the read counts are low or highly variable, the maximum likelihood estimates for the LFCs has high variance, leading to large estimates not representative of true differences, and poor ranking of genes by effect size. One approach is to introduce filtering thresholds and pseudocounts to exclude or moderate estimated LFCs. Filtering may result in a loss of genes from the analysis with true differences in expression, while pseudocounts provide a limited solution that must be adapted per dataset. Here, we propose the use of a heavy-tailed Cauchy prior distribution for effect sizes, which avoids the use of filter thresholds or pseudocounts. The proposed method, Approximate Posterior Estimation for generalized linear model, apeglm, has lower bias than previously proposed shrinkage estimators, while still reducing variance for those genes with little information for statistical inference. Availability and implementation The apeglm package is available as an R/Bioconductor package at https://bioconductor.org/packages/apeglm, and the methods can be called from within the DESeq2 software. Supplementary information Supplementary data are available at Bioinformatics online.

0 comments Cited 589 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Annual Review of Biomedical Data Science

Abbreviated Title: Annu. Rev. Biomed. Data Sci.

Publisher: Annual Reviews

ISSN (Print): 2574-3414

ISSN (Electronic): 2574-3414

Publication date Created: July 20 2019

Publication date (Print): July 20 2019

Volume: 2

Issue: 1

Pages: 139-173

Affiliations

[1 ]Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium

[2 ]Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland;

[3 ]Department of Biostatistics and Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514, USA

[4 ]Department of Computer Science, Stony Brook University, Stony Brook, New York 11794, USA

Article

DOI: 10.1146/annurev-biodatasci-072018-021255

SO-VID: 482a275c-4465-47c7-a33a-9c2467a1e810

History

ScienceOpen disciplines: Computational chemistry & Modeling,Medicine,Biochemistry,Biomedical engineering,Medical physics

Data availability:

ScienceOpen disciplines: Computational chemistry & Modeling, Medicine, Biochemistry, Biomedical engineering, Medical physics

RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis

Read this article at

Abstract

Related collections

Annual Reviews HIV/AIDS: Public Health and Society

Most cited references 115

The transcriptional landscape of the yeast genome defined by RNA sequencing.

Improving RNA-Seq expression estimates by correcting for fragment bias

Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 415

Cited by 49

Most referenced authors 4,736