Communication Efficient Distributed Kernel Principal Component Analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality? In this paper, we give an affirmative answer to the question by developing a communication efficient algorithm to perform kernel PCA in the distributed setting. The algorithm is a clever combination of subspace embedding and adaptive sampling techniques, and we show that the algorithm can take as input an arbitrary configuration of distributed datasets, and compute a set of global kernel principal components with relative error guarantees independent of the dimension of the feature space or the total number of data points. In particular, computing \(k\) principal components with relative error \(\epsilon\) over \(s\) workers has communication cost \(\tilde{O}(s \rho k/\epsilon+s k^2/\epsilon^3)\) words, where \(\rho\) is the average number of nonzero entries in each data point. Furthermore, we experimented the algorithm with large-scale real world datasets and showed that the algorithm produces a high quality kernel PCA solution while using significantly less communication than alternative approaches.

Related collections

Author and article information

Journal

Publication date Created: 2015-03-23

Publication date Updated: 2016-02-13

Article

ArXiV ID: 1503.06858

SO-VID: b849cbba-289b-46de-bbbc-467fcad5fdba

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.LG

ScienceOpen disciplines: Artificial intelligence

Data availability:

ScienceOpen disciplines: Artificial intelligence

Communication Efficient Distributed Kernel Principal Component Analysis

Read this article at

Abstract

Related collections

Journal of Information and Communication Technology

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 176