11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Bootstrapping and Multiple Imputation Ensemble Approaches for Missing Data

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Presence of missing values in a dataset can adversely affect the performance of a classifier; it deteriorates rapidly as missingness increases. Single and Multiple Imputation (MI) are normally performed to fill in the missing values. In this paper, we present several variants of combining MI and bootstrapping to create ensemble that can model uncertainty and diversity in the data and that are robust to high missingness in the data. We present three ensemble strategies: bootstrapping on incomplete data followed by single imputation and MI, and MI ensemble without bootstrapping. We use mean imputation, Gaussian random imputation and expectation maximization as the base imputation methods to be used in these ensemble strategies. We perform an extensive evaluation of the performance of the proposed ensemble strategies on 8 datasets by varying the missingness ratio. Our results show that bootstrapping followed by average of MIs using expectation maximization is the most robust method that prevents the classifier's performance from degrading, even at high missingness ratio (30%). For small missingness ratio (up to 10%) most of the ensemble methods perform equivalently but better than their single imputation counterparts. Kappa-error plots suggest that accurate classifiers with reasonable diversity is the reason for this behaviour. A consistent observation in all the datasets suggests that for small missingness (up to 10%), bootstrapping on incomplete data without any imputation produces equivalent results to other ensemble methods with imputations.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: not found
          • Article: not found

          A comparison of multiple imputation with EM algorithm and MCMC method for quality of life missing data

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A New Procedure to Test Mediation With Missing Data Through Nonparametric Bootstrapping and Multiple Imputation.

            This article proposes a new procedure to test mediation with the presence of missing data by combining nonparametric bootstrapping with multiple imputation (MI). This procedure performs MI first and then bootstrapping for each imputed data set. The proposed procedure is more computationally efficient than the procedure that performs bootstrapping first and then MI for each bootstrap sample. The validity of the procedure is evaluated using a simulation study under different sample size, missing data mechanism, missing data proportion, and shape of distribution conditions. The result suggests that the proposed procedure performs comparably to the procedure that combines bootstrapping with full information maximum likelihood under most conditions. However, caution needs to be taken when using this procedure to handle missing not-at-random or nonnormal data.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Tree-based prediction on incomplete data using imputation or surrogate decisions

                Bookmark

                Author and article information

                Journal
                01 February 2018
                Article
                1802.00154
                188e3cc0-f6b5-4805-ac68-e0d80367362d

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                16 Pages, 11 Tables, 6 Figures
                cs.LG

                Comments

                Comment on this article