Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by “batch effects,” the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.

Related collections

Most cited references 13

Record: found
Abstract: found
Article: not found

Quantitative monitoring of gene expression patterns with a complementary DNA microarray.

M Schena, D Shalon, R W Davis … (1995)

A high-capacity system was developed to monitor the expression of many genes in parallel. Microarrays prepared by high-speed robotic printing of complementary DNAs on glass were used for quantitative expression measurements of the corresponding genes. Because of the small format and high density of the arrays, hybridization volumes of 2 microliters could be used that enabled detection of rare transcripts in probe mixtures derived from 2 micrograms of total cellular messenger RNA. Differential expression measurements of 45 Arabidopsis genes were made by means of simultaneous, two-color fluorescence hybridization.

0 comments Cited 638 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection.

Jason Wong, Linda C. Li (2001)

Recent advances in cDNA and oligonucleotide DNA arrays have made it possible to measure the abundance of mRNA transcripts for many genes simultaneously. The analysis of such experiments is nontrivial because of large data size and many levels of variation introduced at different stages of the experiments. The analysis is further complicated by the large differences that may exist among different probes used to interrogate the same gene. However, an attractive feature of high-density oligonucleotide arrays such as those produced by photolithography and inkjet technology is the standardization of chip manufacturing and hybridization process. As a result, probe-specific biases, although significant, are highly reproducible and predictable, and their adverse effect can be reduced by proper modeling and analysis methods. Here, we propose a statistical model for the probe-level data, and develop model-based estimates for gene expression indexes. We also present model-based methods for identifying and handling cross-hybridizing probes and contaminating array regions. Applications of these results will be presented elsewhere.

0 comments Cited 569 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Singular value decomposition for genome-wide expression data processing and modeling.

D Botstein, P. O. Brown, O Alter (2000)

We describe the use of singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.

0 comments Cited 319 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2011

Publication date (Electronic): 28 February 2011

Volume: 6

Issue: 2

Electronic Location Identifier: e17238

Affiliations

[1 ]National Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, People's Republic of China

[2 ]Department of Psychiatry, University of Chicago, Chicago, Illinois, United States of America

[3 ]Department of Pathology, Zhejiang University, Hangzhou, People's Republic of China

University of California, Davis, United States of America

Author notes

* E-mail: cliu@ 123456yoda.bsd.uchicago.edu

Conceived and designed the experiments: CYL CC DDZ. Performed the experiments: CC. Analyzed the data: CC JB CYL EG. Wrote the paper: CC KG CYL EG LJ.

Article

Publisher ID: PONE-D-10-03425

DOI: 10.1371/journal.pone.0017238

PMC ID: 3046121

PubMed ID: 21386892

SO-VID: 091a29cc-1583-47bb-a278-5df27e651a24

Copyright © Chen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 26 August 2010

Date accepted : 24 January 2011

Page count

Pages: 10

Comments

Comment on this article

scite_

Cited by 202

See all cited by

Most referenced authors 988

See all reference authors

Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods

Read this article at

Abstract

Related collections

PLOS Climate

Most cited references 13

Quantitative monitoring of gene expression patterns with a complementary DNA microarray.

Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection.

Singular value decomposition for genome-wide expression data processing and modeling.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 246

Cited by 202

Most referenced authors 988