51
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A redundancy-removing feature selection algorithm for nominal data

      research-article
      1 , 2 , 3 , 4 , , 2 , 3
      PeerJ Computer Science
      PeerJ Inc.
      Nominal data, Feature selection, Redundancy-removing, Mutual information

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          No order correlation or similarity metric exists in nominal data, and there will always be more redundancy in a nominal dataset, which means that an efficient mutual information-based nominal-data feature selection method is relatively difficult to find. In this paper, a nominal-data feature selection method based on mutual information without data transformation, called the redundancy-removing more relevance less redundancy algorithm, is proposed. By forming several new information-related definitions and the corresponding computational methods, the proposed method can compute the information-related amount of nominal data directly. Furthermore, by creating a new evaluation function that considers both the relevance and the redundancy globally, the new feature selection method can evaluate the importance of each nominal-data feature. Although the presented feature selection method takes commonly used MIFS-like forms, it is capable of handling high-dimensional datasets without expensive computations. We perform extensive experimental comparisons of the proposed algorithm and other methods using three benchmarking nominal datasets with two different classifiers. The experimental results demonstrate the average advantage of the presented algorithm over the well-known NMIFS algorithm in terms of the feature selection and classification accuracy, which indicates that the proposed method has a promising performance.

          Most cited references21

          • Record: found
          • Abstract: not found
          • Article: not found

          Statistical pattern recognition: a review

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Using mutual information for selecting features in supervised neural net learning.

            R Battiti (1994)
            This paper investigates the application of the mutual information criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is suitable for assessing the "information content" of features in complex classification tasks, where methods bases on linear relations (like the correlation) are prone to mistakes. The fact that the mutual information is independent of the coordinates chosen permits a robust estimation. Nonetheless, the use of the mutual information for tasks characterized by high input dimensionality requires suitable approximations because of the prohibitive demands on computation and samples. An algorithm is proposed that is based on a "greedy" selection of the features and that takes both the mutual information with respect to the output class and with respect to the already-selected features into account. Finally the results of a series of experiments are discussed.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Normalized mutual information feature selection.

              A filter method of feature selection based on mutual information, called normalized mutual information feature selection (NMIFS), is presented. NMIFS is an enhancement over Battiti's MIFS, MIFS-U, and mRMR methods. The average normalized mutual information is proposed as a measure of redundancy among features. NMIFS outperformed MIFS, MIFS-U, and mRMR on several artificial and benchmark data sets without requiring a user-defined parameter. In addition, NMIFS is combined with a genetic algorithm to form a hybrid filter/wrapper method called GAMIFS. This includes an initialization procedure and a mutation operator based on NMIFS to speed up the convergence of the genetic algorithm. GAMIFS overcomes the limitations of incremental search algorithms that are unable to find dependencies between groups of features.
                Bookmark

                Author and article information

                Contributors
                Journal
                peerj-cs
                PeerJ Computer Science
                PeerJ Comput. Sci.
                PeerJ Inc. (San Francisco, USA )
                2376-5992
                14 October 2015
                : 1
                : e24
                Affiliations
                [1 ]Key Laboratory of Advanced Process Control for Light Industry Ministry of Education , JiangSu, China
                [2 ]Engineering Research Center of Internet of Things Technology, Application Ministry of Education , JiangSu, China
                [3 ]Department of Computer Science, Engineering School of Internet of Things Engineering, Jiangnan University , JiangSu, China
                [4 ]Department of Computer Science, Georgia State University , Atlanta, GA, United States of America
                Article
                cs-24
                10.7717/peerj-cs.24
                0e8079c8-718b-4c17-9d3e-514e22f2e8d7
                © 2015 Li and Gu

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

                History
                : 4 June 2015
                : 10 September 2015
                Funding
                Funded by: Science and Technology Department of Jiangsu Province
                Award ID: BY2013015-23
                Funded by: Fundamental Research Funds for the Ministry of Education
                Award ID: JUSRP211A 41
                This work is supported by the Future Research Projects Funds for the Science and Technology Department of Jiangsu Province (Grant No. BY2013015-23) and the Fundamental Research Funds for the Ministry of Education (Grant No. JUSRP211A 41).The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Data Mining and Machine Learning
                Data Science

                Computer science
                Nominal data,Feature selection,Redundancy-removing,Mutual information
                Computer science
                Nominal data, Feature selection, Redundancy-removing, Mutual information

                Comments

                Comment on this article