There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
In this paper we report exploratory analyses of high-density oligonucleotide array
data from the Affymetrix GeneChip system with the objective of improving upon currently
used measures of gene expression. Our analyses make use of three data sets: a small
experimental study consisting of five MGU74A mouse GeneChip arrays, part of the data
from an extensive spike-in study conducted by Gene Logic and Wyeth's Genetics Institute
involving 95 HG-U95A human GeneChip arrays; and part of a dilution study conducted
by Gene Logic involving 75 HG-U95A GeneChip arrays. We display some familiar features
of the perfect match and mismatch probe (PM and MM) values of these data, and examine
the variance-mean relationship with probe-level data from probes believed to be defective,
and so delivering noise only. We explain why we need to normalize the arrays to one
another using probe level intensities. We then examine the behavior of the PM and
MM using spike-in data and assess three commonly used summary measures: Affymetrix's
(i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong
multiplicative model-based expression index (MBEI). The exploratory data analyses
of the probe level data motivate a new summary measure that is a robust multi-array
average (RMA) of background-adjusted, normalized, and log-transformed PM values. We
evaluate the four expression summary measures using the dilution study data, assessing
their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally,
we evaluate the algorithms in terms of their ability to detect known levels of differential
expression using the spike-in data. We conclude that there is no obvious downside
to using RMA and attaching a standard error (SE) to this quantity using a linear model
which removes probe-specific affinities.