Gene expression profile analyses have been used in numerous studies covering a broad
range of areas in biology. When unreliable measurements are excluded, missing values
are introduced in gene expression profiles. Although existing multivariate analysis
methods have difficulty with the treatment of missing values, this problem has received
little attention. There are many options for dealing with missing values, each of
which reaches drastically different results. Ignoring missing values is the simplest
method and is frequently applied. This approach, however, has its flaws. In this article,
we propose an estimation method for missing values, which is based on Bayesian principal
component analysis (BPCA). Although the methodology that a probabilistic model and
latent variables are estimated simultaneously within the framework of Bayes inference
is not new in principle, actual BPCA implementation that makes it possible to estimate
arbitrary missing variables is new in terms of statistical methodology.
When applied to DNA microarray data from various experimental conditions, the BPCA
method exhibited markedly better estimation ability than other recently proposed methods,
such as singular value decomposition and K-nearest neighbors. While the estimation
performance of existing methods depends on model parameters whose determination is
difficult, our BPCA method is free from this difficulty. Accordingly, the BPCA method
provides accurate and convenient estimation for missing values.
The software is available at http://hawaii.aist-nara.ac.jp/~shige-o/tools/.