83
views
0
recommends
+1 Recommend
0 collections
    5
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Assessing the Performance of Prediction Models : A Framework for Traditional and Novel Measures

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The performance of prediction models can be assessed using a variety of methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic [ROC] curve), and goodness-of-fit statistics for calibration.Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision-analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration, we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n = 544 for model development, n = 273 for external validation).We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.

          Related collections

          Most cited references39

          • Record: found
          • Abstract: found
          • Article: not found

          Survival model predictive accuracy and ROC curves.

          The predictive accuracy of a survival model can be summarized using extensions of the proportion of variation explained by the model, or R2, commonly used for continuous response models, or using extensions of sensitivity and specificity, which are commonly used for binary response models. In this article we propose new time-dependent accuracy summaries based on time-specific versions of sensitivity and specificity calculated over risk sets. We connect the accuracy summaries to a previously proposed global concordance measure, which is a variant of Kendall's tau. In addition, we show how standard Cox regression output can be used to obtain estimates of time-dependent sensitivity and specificity, and time-dependent receiver operating characteristic (ROC) curves. Semiparametric estimation methods appropriate for both proportional and nonproportional hazards data are introduced, evaluated in simulations, and illustrated using two familiar survival data sets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker.

            M. S. Pepe (2004)
            A marker strongly associated with outcome (or disease) is often assumed to be effective for classifying persons according to their current or future outcome. However, for this assumption to be true, the associated odds ratio must be of a magnitude rarely seen in epidemiologic studies. In this paper, an illustration of the relation between odds ratios and receiver operating characteristic curves shows, for example, that a marker with an odds ratio of as high as 3 is in fact a very poor classification tool. If a marker identifies 10% of controls as positive (false positives) and has an odds ratio of 3, then it will correctly identify only 25% of cases as positive (true positives). The authors illustrate that a single measure of association such as an odds ratio does not meaningfully describe a marker's ability to classify subjects. Appropriate statistical methods for assessing and reporting the classification power of a marker are described. In addition, the serious pitfalls of using more traditional methods based on parameters in logistic regression models are illustrated.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Assessing the generalizability of prognostic information.

              Physicians are often asked to make prognostic assessments but often worry that their assessments will prove inaccurate. Prognostic systems were developed to enhance the accuracy of such assessments. This paper describes an approach for evaluating prognostic systems based on the accuracy (calibration and discrimination) and generalizability (reproducibility and transportability) of the system's predictions. Reproducibility is the ability to produce accurate predictions among patients not included in the development of the system but from the same population. Transportability is the ability to produce accurate predictions among patients drawn from a different but plausibly related population. On the basis of the observation that the generalizability of a prognostic system is commonly limited to a single historical period, geographic location, methodologic approach, disease spectrum, or follow-up interval, we describe a working hierarchy of the cumulative generalizability of prognostic systems. This approach is illustrated in a structured review of the Dukes and Jass staging systems for colon and rectal cancer and applied to a young man with colon cancer. Because it treats the development of the system as a "black box" and evaluates only the performance of the predictions, the approach can be applied to any system that generates predicted probabilities. Although the Dukes and Jass staging systems are discrete, the approach can also be applied to systems that generate continuous predictions and, with some modification, to systems that predict over multiple time periods. Like any scientific hypothesis, the generalizability of a prognostic system is established by being tested and being found accurate across increasingly diverse settings. The more numerous and diverse the settings in which the system is tested and found accurate, the more likely it will generalize to an untested setting.
                Bookmark

                Author and article information

                Journal
                Epidemiology
                Epidemiology
                Ovid Technologies (Wolters Kluwer Health)
                1044-3983
                2010
                January 2010
                : 21
                : 1
                : 128-138
                Article
                10.1097/EDE.0b013e3181c30fb2
                3575184
                20010215
                2d26ef71-8f80-4919-a217-20c0b0fd2e24
                © 2010
                History

                Comments

                Comment on this article