A scaling normalization method for differential expression analysis of RNA-seq data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

A novel and empirical method for normalization of RNA-seq data is presented

Abstract

The fine detail provided by sequencing-based transcriptome surveys suggests that RNA-seq is likely to become the platform of choice for interrogating steady state RNA. In order to discover biologically important changes in expression, we show that normalization continues to be an essential step in the analysis. We outline a simple and effective method for performing normalization and show dramatically improved results for inferring differential expression in simulated and publicly available data sets.

Related collections

Most cited references 12

Record: found
Abstract: not found
Article: not found

Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation.

Y. H. Yang (2002)

There are many sources of systematic variation in cDNA microarray experiments which affect the measured gene expression levels (e.g. differences in labeling efficiency between the two fluorescent dyes). The term normalization refers to the process of removing such variation. A constant adjustment is often used to force the distribution of the intensity log ratios to have a median of zero for each slide. However, such global normalization approaches are not adequate in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments. The selection of appropriate controls for normalization is discussed and a novel set of controls (microarray sample pool, MSP) is introduced to aid in intensity-dependent normalization. Lastly, to allow for comparisons of expression levels across slides, a robust method based on maximum likelihood estimation is proposed to adjust for scale differences among slides.

0 comments Cited 803 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Small-sample estimation of negative binomial dispersion, with applications to SAGE data.

Mark Robinson, Gordon K. Smyth (2008)

We derive a quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution and compare its performance, in terms of bias, to various other methods. Our estimation scheme outperforms all other methods in very small samples, typical of those from serial analysis of gene expression studies, the motivating data for this study. The impact of dispersion estimation on hypothesis testing is studied. We derive an "exact" test that outperforms the standard approximate asymptotic tests.

0 comments Cited 460 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Moderated statistical tests for assessing differences in tag abundance.

Mark Robinson, Gordon K. Smyth (2007)

Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small. We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. Not only is our strategy applicable even with the smallest number of libraries, but it also proves to be more powerful than previous strategies when more libraries are available. The methodology is equally applicable to other counting technologies, such as proteomic spectral counts. An R package can be accessed from http://bioinf.wehi.edu.au/resources/

0 comments Cited 395 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Genome Biol

Title: Genome Biology

Publisher: BioMed Central

ISSN (Print): 1465-6906

ISSN (Electronic): 1465-6914

Publication date (Print): 2010

Publication date (Electronic): 2 March 2010

Volume: 11

Issue: 3

Page: R25

Affiliations

[1 ]Bioinformatics Division, Walter and Eliza Hall Institute, 1G Royal Parade, Parkville 3052, Australia

[2 ]Epigenetics Laboratory, Cancer Program, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW 2010, Australia

Article

Publisher ID: gb-2010-11-3-r25

DOI: 10.1186/gb-2010-11-3-r25

PMC ID: 2864565

PubMed ID: 20196867

SO-VID: 65042c84-27f2-41e6-8d94-efa011d40fb5

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 19 November 2009

Date revision received : 28 January 2010

Date accepted : 2 March 2010

Comments

Comment on this article

scite_

Cited by 3,178

See all cited by

Most referenced authors 856

See all reference authors

- Version 1

A scaling normalization method for differential expression analysis of RNA-seq data

Read this article at

Abstract

Abstract

Related collections

RNA drug delivery

Most cited references 12

Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation.

Small-sample estimation of negative binomial dispersion, with applications to SAGE data.

Moderated statistical tests for assessing differences in tag abundance.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 19

Cited by 3,178

Most referenced authors 856