The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.

Related collections

Most cited references 28

Record: found
Abstract: found
Article: not found

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

J A Hanley, B J McNeil, Marnix van Holsbeeck (1982)

A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.

0 comments Cited 3873 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Measuring the accuracy of diagnostic systems.

J Swets (1988)

Diagnostic systems of several kinds are used to distinguish between two classes of events, essentially "signals" and "noise". For them, analysis in terms of the "relative operating characteristic" of signal detection theory provides a precise and valid measure of diagnostic accuracy. It is the only measure available that is uninfluenced by decision biases and prior probabilities, and it places the performances of diverse systems on a common, easily interpreted scale. Representative values of this measure are reported here for systems in medical imaging, materials testing, weather forecasting, information retrieval, polygraph lie detection, and aptitude testing. Though the measure itself is sound, the values obtained from tests of diagnostic systems often require qualification because the test data on which they are based are of unsure quality. A common set of problems in testing is faced in all fields. How well these problems are handled, or can be handled in a given field, determines the degree of confidence that can be placed in a measured value of accuracy. Some fields fare much better than others.

0 comments Cited 829 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Fast folding and comparison of RNA secondary structures

I. Hofacker, W Fontana, P. Stadler … (1994)

0 comments Cited 433 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Guy Brock: Role: Academic Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Electronic): 1932-6203

Publication date (Electronic): 4 March 2015

Publication date Collection: 2015

Volume: 10

Issue: 3

Electronic Location Identifier: e0118432

Affiliations

[001]Computational Biology Unit, Department of Informatics, University of Bergen, P. O. Box 7803, N-5020, Bergen, Norway

University of Louisville, UNITED STATES

Author notes

Competing Interests: The authors have declared that no competing interests exist.

Conceived and designed the experiments: TS MR. Performed the experiments: TS. Analyzed the data: TS. Wrote the paper: TS MR.

* E-mail: takaya.saito@ 123456ii.uib.no (TS); marc.rehmsmeier@ 123456ii.uib.no (MR)

Article

Publisher ID: PONE-D-14-26790

DOI: 10.1371/journal.pone.0118432

PMC ID: 4349800

PubMed ID: 25738806

SO-VID: d1b36d2b-c603-497d-9e15-2367ba0ffae8

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

History

Date received : 23 June 2014

Date accepted : 16 January 2015

Page count

Figures: 7, Tables: 5, Pages: 21

Funding

The authors have no funding or support to report.

Custom metadata

Data Availability Data are available from http://dx.doi.org/10.6084/m9.figshare.1245061.

ScienceOpen disciplines: Uncategorized

Data availability:

ScienceOpen disciplines: Uncategorized

Comments

Comment on this article

scite_

Cited by 863

See all cited by

Most referenced authors 778

See all reference authors

- Version 1

The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets

Read this article at

Abstract

Related collections

PLOS Climate

Most cited references 28

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Measuring the accuracy of diagnostic systems.

Fast folding and comparison of RNA secondary structures

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 77

Cited by 863

Most referenced authors 778