2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Progressive Boosting for Class Imbalance

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Pattern recognition applications often suffer from skewed data distributions between classes, which may vary during operations w.r.t. the design data. Two-class classification systems designed using skewed data tend to recognize the majority class better than the minority class of interest. Several data-level techniques have been proposed to alleviate this issue by up-sampling minority samples or under-sampling majority samples. However, some informative samples may be neglected by random under-sampling and adding synthetic positive samples through up-sampling adds to training complexity. In this paper, a new ensemble learning algorithm called Progressive Boosting (PBoost) is proposed that progressively inserts uncorrelated groups of samples into a Boosting procedure to avoid loss of information while generating a diverse pool of classifiers. Base classifiers in this ensemble are generated from one iteration to the next, using subsets from a validation set that grows gradually in size and imbalance. Consequently, PBoost is more robust to unknown and variable levels of skew in operational data, and has lower computation complexity than Boosting ensembles in literature. In PBoost, a new loss factor is proposed to avoid bias of performance towards the negative class. Using this loss factor, the weight update of samples and classifier contribution in final predictions are set based on the ability to recognize both classes. Using the proposed loss factor instead of standard accuracy can avoid biasing performance in any Boosting ensemble. The proposed approach was validated and compared using synthetic data, videos from the FIA dataset that emulates face re-identification applications, and KEEL collection of datasets. Results show that PBoost can outperform state of the art techniques in terms of both accuracy and complexity over different levels of imbalance and overlap between classes.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: not found

          Survey of clustering algorithms.

          Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Cost-sensitive boosting for classification of imbalanced data

                Bookmark

                Author and article information

                Journal
                2017-06-05
                Article
                1706.01531
                f4e8d10c-7b1a-4a6f-9dcb-c35e61e94d9e

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.LG cs.CV

                Artificial intelligence
                Artificial intelligence

                Comments

                Comment on this article