8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Simple strategies for semi-supervised feature selection

      research-article
      ,
      Machine Learning
      Springer US
      Semi-supervised, Positive unlabelled, Feature selection

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some “soft” prior knowledge of the domain, we derive two novel algorithms ( Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: not found

          Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.

          Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Statistical comparison of classifiers over multiple data sets

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              A unifying view on dataset shift in classification

                Bookmark

                Author and article information

                Contributors
                konstantinos.sechidis@manchester.ac.uk
                gavin.brown@manchester.ac.uk
                Journal
                Mach Learn
                Mach Learn
                Machine Learning
                Springer US (New York )
                0885-6125
                1573-0565
                17 July 2017
                17 July 2017
                2018
                : 107
                : 2
                : 357-395
                Affiliations
                ISNI 0000000121662407, GRID grid.5379.8, School of Computer Science, , University of Manchester, ; Manchester, M13 9PL UK
                Author information
                http://orcid.org/0000-0001-6582-7453
                Article
                5648
                10.1007/s10994-017-5648-2
                6954040
                31983804
                a6611bd0-b937-4e80-be25-335e508815a2
                © The Author(s) 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

                History
                : 18 April 2016
                : 8 June 2017
                Funding
                Funded by: Engineering and Physical Sciences Research Council (EPSRC)
                Award ID: EP/L000725/1
                Categories
                Article
                Custom metadata
                © The Author(s) 2018

                semi-supervised,positive unlabelled,feature selection

                Comments

                Comment on this article