28
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We conducted an extensive set of empirical analyses to examine the effect of the number of events per variable (EPV) on the relative performance of three different methods for assessing the predictive accuracy of a logistic regression model: apparent performance in the analysis sample, split-sample validation, and optimism correction using bootstrap methods. Using a single dataset of patients hospitalized with heart failure, we compared the estimates of discriminatory performance from these methods to those for a very large independent validation sample arising from the same population. As anticipated, the apparent performance was optimistically biased, with the degree of optimism diminishing as the number of events per variable increased. Differences between the bootstrap-corrected approach and the use of an independent validation sample were minimal once the number of events per variable was at least 20. Split-sample assessment resulted in too pessimistic and highly uncertain estimates of model performance. Apparent performance estimates had lower mean squared error compared to split-sample estimates, but the lowest mean squared error was obtained by bootstrap-corrected optimism estimates. For bias, variance, and mean squared error of the performance estimates, the penalty incurred by using split-sample validation was equivalent to reducing the sample size by a proportion equivalent to the proportion of the sample that was withheld for model validation. In conclusion, split-sample validation is inefficient and apparent performance is too optimistic for internal validation of regression-based prediction models. Modern validation methods, such as bootstrap-based optimism correction, are preferable. While these findings may be unsurprising to many statisticians, the results of the current study reinforce what should be considered good statistical practice in the development and validation of clinical prediction models.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Book: not found

          An Introduction to the Bootstrap

          Statistics is a subject of many uses and surprisingly few effective practitioners. The traditional road to statistical knowledge is blocked, for most, by a formidable wall of mathematics. The approach in An Introduction to the Bootstrap avoids that wall. It arms scientists and engineers, as well as statisticians, with the computational techniques they need to analyze and understand complicated data sets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.

            The performance of a predictive model is overestimated when simply determined on the sample of subjects that was used to construct the model. Several internal validation methods are available that aim to provide a more accurate estimate of model performance in new subjects. We evaluated several variants of split-sample, cross-validation and bootstrapping methods with a logistic regression model that included eight predictors for 30-day mortality after an acute myocardial infarction. Random samples with a size between n = 572 and n = 9165 were drawn from a large data set (GUSTO-I; n = 40,830; 2851 deaths) to reflect modeling in data sets with between 5 and 80 events per variable. Independent performance was determined on the remaining subjects. Performance measures included discriminative ability, calibration and overall accuracy. We found that split-sample analyses gave overly pessimistic estimates of performance, with large variability. Cross-validation on 10% of the sample had low bias and low variability, but was not suitable for all performance measures. Internal validity could best be estimated with bootstrapping, which provided stable estimates with low bias. We conclude that split-sample validation is inefficient, and recommend bootstrapping for estimation of internal validity of a predictive logistic regression model.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Assessing the generalizability of prognostic information.

              Physicians are often asked to make prognostic assessments but often worry that their assessments will prove inaccurate. Prognostic systems were developed to enhance the accuracy of such assessments. This paper describes an approach for evaluating prognostic systems based on the accuracy (calibration and discrimination) and generalizability (reproducibility and transportability) of the system's predictions. Reproducibility is the ability to produce accurate predictions among patients not included in the development of the system but from the same population. Transportability is the ability to produce accurate predictions among patients drawn from a different but plausibly related population. On the basis of the observation that the generalizability of a prognostic system is commonly limited to a single historical period, geographic location, methodologic approach, disease spectrum, or follow-up interval, we describe a working hierarchy of the cumulative generalizability of prognostic systems. This approach is illustrated in a structured review of the Dukes and Jass staging systems for colon and rectal cancer and applied to a young man with colon cancer. Because it treats the development of the system as a "black box" and evaluates only the performance of the predictions, the approach can be applied to any system that generates predicted probabilities. Although the Dukes and Jass staging systems are discrete, the approach can also be applied to systems that generate continuous predictions and, with some modification, to systems that predict over multiple time periods. Like any scientific hypothesis, the generalizability of a prognostic system is established by being tested and being found accurate across increasingly diverse settings. The more numerous and diverse the settings in which the system is tested and found accurate, the more likely it will generalize to an untested setting.
                Bookmark

                Author and article information

                Journal
                Stat Methods Med Res
                Stat Methods Med Res
                SMM
                spsmm
                Statistical Methods in Medical Research
                SAGE Publications (Sage UK: London, England )
                0962-2802
                1477-0334
                19 November 2014
                April 2017
                : 26
                : 2
                : 796-808
                Affiliations
                [1 ]Institute for Clinical Evaluative Sciences, Toronto, Canada
                [2 ]Institute of Health Management, Policy and Evaluation, University of Toronto, Toronto, Canada
                [3 ]Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, Canada
                [4 ]Department of Public Health, Erasmus MC – University Medical Center Rotterdam, Rotterdam, The Netherlands
                Author notes
                [*]Peter C Austin, Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada. Email: peter.austin@ 123456ices.on.ca
                Article
                10.1177_0962280214558972
                10.1177/0962280214558972
                5394463
                25411322
                fc3cbadc-cfc3-4c9e-9e4e-43d99cdbdb1f
                © The Author(s) 2014

                This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 3.0 License ( http://www.creativecommons.org/licenses/by-nc/3.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page( https://us.sagepub.com/en-us/nam/open-access-at-sage).

                History
                Categories
                Articles

                logistic regression,model validation,bootstrap,discrimination,c-statistic,clinical prediction models,data splitting,receiver operating characteristic curve

                Comments

                Comment on this article