pROC: an open-source package for R and S+ to analyze and compare ROC curves

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface.

Results

With data previously imported into the R or S+ environment, the pROC package builds ROC curves and includes functions for computing confidence intervals, statistical tests for comparing total or partial area under the curve or the operating points of different classifiers, and methods for smoothing ROC curves. Intermediary and final results are visualised in user-friendly interfaces. A case study based on published clinical and biomarker data shows how to perform a typical ROC analysis with pROC.

Conclusions

pROC is a package for R and S+ specifically dedicated to ROC analysis. It proposes multiple statistical tests to compare ROC curves, and in particular partial areas under the curve, allowing proper ROC interpretation. pROC is available in two versions: in the R programming language or with a graphical user interface in the S+ statistical software. It is accessible at http://expasy.org/tools/pROC/ under the GNU General Public License. It is also distributed through the CRAN and CSAN public repositories, facilitating its installation.

Related collections

Most cited references 19

Record: found
Abstract: found
Article: not found

What's under the ROC? An introduction to receiver operating characteristics curves.

John Cairney, David L Streiner (2007)

It is often necessary to dichotomize a continuous scale to separate respondents into normal and abnormal groups. However, because the distributions of the scores in these 2 groups most often overlap, any cut point that is chosen will result in 2 types of errors: false negatives (that is, abnormal cases judged to be normal) and false positives (that is, normal cases placed in the abnormal group). Changing the cut point will alter the numbers of erroneous judgments but will not eliminate the problem. A technique called receiver operating characteristic (ROC) curves allows us to determine the ability ofa test to discriminate between groups, to choose the optimal cut point, and to compare the performance of 2 or more tests. We discuss how to calculate and compareROC curves and the factors that must be considered in choosing an optimal cut point.

0 comments Cited 124 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Relative Operating Characteristic in Psychology: A technique for isolating effects of response bias finds wide use in the study of perception and cognition.

J Swets (1973)

The clinician looking, listening, or feeling for signs of a disease may far prefer a false alarm to a miss, particularly if the disease is serious and contagious. On the other hand, he may believe that the available therapy is marginally effective, expensive, and debilitating. The pilot seeing the landing lights only when they are a few yards away may decide that his plane is adequately aligned with the runway if he is alone and familiar with that plight. He may be more inclined to circle the field before another try at landing if he has many passengers and recent memory of another plane crashing under those circumstances. The Food and Drug administrator suspecting botulism in a canned food may not want to accept even a remote threat to the public health. But he may be less clearly biased if a recent false alarm has cost a canning company millions of dollars and left some damaged reputations. The making of almost any fine discrimination is beset with such considerations of probability and utility, which are extraneous and potentially confounding when one is attempting to measure the acuity of discrimination per se. The ROC is an analytical technique, with origins in statistical decision theory and electronic detection theory, that quite effectively isolates the effects of the observer's response bias, or decision criterion, in the study of discrimination behavior. This capability, pursued through a century of psychological testing, provides a relatively pure measure of the discriminability of different stimuli and of the capacity of organisms to discriminate. The ROC also treats quantitatively the response, or decision, aspects of choice behavior. The decision parameter can then be functionally related to the probabilities of the stimulus alternatives and to the utilities of the various stimulus-response pairs, or to the observer's expectations and motivations. In separating and quantifying discrimination and decision processes, the ROC promises a more reliable and valid solution to some practical problems and enhances our understanding of the perceptual and cognitive phenomena that depend directly on these fundamental processes. In several problem areas in psychology, effects that were supposed to reflect properties of the discrimination process have been shown by the ROC analysis to reflect instead properties of the decision process.

0 comments Cited 104 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Small-sample precision of ROC-related estimates.

Chao Sima, Paul E Dougherty, Michael Bittner … (2010)

The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics? Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results. Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html edward@mail.ece.tamu.edu.

0 comments Cited 75 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2011

Publication date (Electronic): 17 March 2011

Volume: 12

Page: 77

Affiliations

[1 ]Biomedical Proteomics Research Group, Department of Structural Biology and Bioinformatics, Medical University Centre, Geneva, Switzerland

[2 ]Swiss Institute of Bioinformatics, Medical University Centre, Geneva, Switzerland

Article

Publisher ID: 1471-2105-12-77

DOI: 10.1186/1471-2105-12-77

PMC ID: 3068975

PubMed ID: 21414208

SO-VID: 423af74d-6888-4b9b-8cae-e57c91c35df8

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 10 September 2010

Date accepted : 17 March 2011

Comments

Comment on this article

scite_

Cited by 4,297

See all cited by

pROC: an open-source package for R and S+ to analyze and compare ROC curves

Read this article at

Abstract

Background

Results

Conclusions

Related collections

REPO4EU WP2 Tools

Most cited references 19

What's under the ROC? An introduction to receiver operating characteristics curves.

The Relative Operating Characteristic in Psychology: A technique for isolating effects of response bias finds wide use in the study of perception and cognition.

Small-sample precision of ROC-related estimates.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 21

Cited by 4,297