Feature selection in MLPs and SVMs based on maximum output information.

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

This paper presents feature selection algorithms for multilayer perceptrons (MLPs) and multiclass support vector machines (SVMs), using mutual information between class labels and classifier outputs, as an objective function. This objective function involves inexpensive computation of information measures only on discrete variables; provides immunity to prior class probabilities; and brackets the probability of error of the classifier. The maximum output information (MOI) algorithms employ this function for feature subset selection by greedy elimination and directed search. The output of the MOI algorithms is a feature subset of user-defined size and an associated trained classifier (MLP/SVM). These algorithms compare favorably with a number of other methods in terms of performance on various artificial and real-world data sets.

Related collections

Most cited references 10

Record: found
Abstract: found
Article: not found

Using mutual information for selecting features in supervised neural net learning.

R Battiti (1994)

This paper investigates the application of the mutual information criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is suitable for assessing the "information content" of features in complex classification tasks, where methods bases on linear relations (like the correlation) are prone to mistakes. The fact that the mutual information is independent of the coordinates chosen permits a robust estimation. Nonetheless, the use of the mutual information for tasks characterized by high input dimensionality requires suitable approximations because of the prohibitive demands on computation and samples. An algorithm is proposed that is based on a "greedy" selection of the features and that takes both the mutual information with respect to the output class and with respect to the already-selected features into account. Finally the results of a series of experiments are discussed.