There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
DNA microarrays have recently been used for the purpose of monitoring expression levels
of thousands of genes simultaneously and identifying those genes that are differentially
expressed. The probability that a false identification (type I error) is committed
can increase sharply when the number of tested genes gets large. Correlation between
the test statistics attributed to gene co-regulation and dependency in the measurement
errors of the gene expression levels further complicates the problem. In this paper
we address this very large multiplicity problem by adopting the false discovery rate
(FDR) controlling approach. In order to address the dependency problem, we present
three resampling-based FDR controlling procedures, that account for the test statistics
distribution, and compare their performance to that of the naïve application of the
linear step-up procedure in Benjamini and Hochberg (1995). The procedures are studied
using simulated microarray data, and their performance is examined relative to their
ease of implementation.
Comparative simulation analysis shows that all four FDR controlling procedures control
the FDR at the desired level, and retain substantially more power then the family-wise
error rate controlling procedures. In terms of power, using resampling of the marginal
distribution of each test statistics substantially improves the performance over the
naïve one. The highest power is achieved, at the expense of a more sophisticated algorithm,
by the resampling-based procedures that resample the joint distribution of the test
statistics and estimate the level of FDR control.
An R program that adjusts p-values using FDR controlling procedures is freely available
over the Internet at www.math.tau.ac.il/~ybenja.