Does imputation matter? Benchmark for predictive models

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Incomplete data are common in practical applications. Most predictive machine learning models do not handle missing values so they require some preprocessing. Although many algorithms are used for data imputation, we do not understand the impact of the different methods on the predictive models' performance. This paper is first that systematically evaluates the empirical effectiveness of data imputation algorithms for predictive models. The main contributions are (1) the recommendation of a general method for empirical benchmarking based on real-life classification tasks and the (2) comparative analysis of different imputation methods for a collection of data sets and a collection of ML algorithms.

Related collections

Author and article information

Journal

Publication date Created: 06 July 2020

Article

ArXiV ID: 2007.02837

SO-VID: ce6f046c-7040-46b1-9e3a-ad19d45b25af

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories stat.ML cs.LG

ScienceOpen disciplines: Machine learning,Artificial intelligence

Data availability:

ScienceOpen disciplines: Machine learning, Artificial intelligence

Does imputation matter? Benchmark for predictive models

Read this article at

Abstract

Related collections

Privacy and Data Protection

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 580