PMLB: A Large Benchmark Suite for Machine Learning Evaluation and
  Comparison

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.

Related collections

Most cited references 13

Record: found
Abstract: not found
Conference Proceedings: not found

An empirical comparison of supervised learning algorithms

Rich Caruana, Alexandru Niculescu-Mizil (2006)

0 comments Cited 245 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: not found

Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition.

J Stallkamp, M Schlipsing, J. Salmen … (2012)

Traffic signs are characterized by a wide variability in their visual appearance in real-world environments. For example, changes of illumination, varying weather conditions and partial occlusions impact the perception of road signs. In practice, a large number of different sign classes needs to be recognized with very high accuracy. Traffic signs have been designed to be easily readable for humans, who perform very well at this task. For computer systems, however, classifying traffic signs still seems to pose a challenging pattern recognition problem. Both image processing and machine learning algorithms are continuously refined to improve on this task. But little systematic comparison of such systems exist. What is the status quo? Do today's algorithms reach human performance? For assessing the performance of state-of-the-art machine learning algorithms, we present a publicly available traffic sign dataset with more than 50,000 images of German road signs in 43 classes. The data was considered in the second stage of the German Traffic Sign Recognition Benchmark held at IJCNN 2011. The results of this competition are reported and the best-performing algorithms are briefly described. Convolutional neural networks (CNNs) showed particularly high classification accuracies in the competition. We measured the performance of human subjects on the same data-and the CNNs outperformed the human test persons.

0 comments Cited 134 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction.

Digna Velez, Bill C. White, Alison A. Motsinger … (2007)

Multifactor dimensionality reduction (MDR) was developed as a method for detecting statistical patterns of epistasis. The overall goal of MDR is to change the representation space of the data to make interactions easier to detect. It is well known that machine learning methods may not provide robust models when the class variable (e.g. case-control status) is imbalanced and accuracy is used as the fitness measure. This is because most methods learn patterns that are relevant for the larger of the two classes. The goal of this study was to evaluate three different strategies for improving the power of MDR to detect epistasis in imbalanced datasets. The methods evaluated were: (1) over-sampling that resamples with replacement the smaller class until the data are balanced, (2) under-sampling that randomly removes subjects from the larger class until the data are balanced, and (3) balanced accuracy [(sensitivity+specificity)/2] as the fitness function with and without an adjusted threshold. These three methods were compared using simulated data with two-locus epistatic interactions of varying heritability (0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4) and minor allele frequency (0.2, 0.4) that were embedded in 100 replicate datasets of varying sample sizes (400, 800, 1600). Each dataset was generated with different ratios of cases to controls (1 : 1, 1 : 2, 1 : 4). We found that the balanced accuracy function with an adjusted threshold significantly outperformed both over-sampling and under-sampling and fully recovered the power. These results suggest that balanced accuracy should be used instead of accuracy for the MDR analysis of epistasis in imbalanced datasets. (c) 2007 Wiley-Liss, Inc.

0 comments Cited 124 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Publication date Created: 2017-03-01

Article

ArXiV ID: 1703.00512

SO-VID: 0b1bf21f-0059-4292-9224-6d8f95303180

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments 14 pages, 5 figures, submitted for review to JMLR

Categories cs.LG cs.AI

ScienceOpen disciplines: Artificial intelligence

Data availability:

ScienceOpen disciplines: Artificial intelligence

PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 13

An empirical comparison of supervised learning algorithms

Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition.

A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction.

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 195

Most referenced authors 161