10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sample size calculation to externally validate scoring systems based on logistic regression models

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          A sample size containing at least 100 events and 100 non-events has been suggested to validate a predictive model, regardless of the model being validated and that certain factors can influence calibration of the predictive model (discrimination, parameterization and incidence). Scoring systems based on binary logistic regression models are a specific type of predictive model.

          Objective

          The aim of this study was to develop an algorithm to determine the sample size for validating a scoring system based on a binary logistic regression model and to apply it to a case study.

          Methods

          The algorithm was based on bootstrap samples in which the area under the ROC curve, the observed event probabilities through smooth curves, and a measure to determine the lack of calibration (estimated calibration index) were calculated. To illustrate its use for interested researchers, the algorithm was applied to a scoring system, based on a binary logistic regression model, to determine mortality in intensive care units.

          Results

          In the case study provided, the algorithm obtained a sample size with 69 events, which is lower than the value suggested in the literature.

          Conclusion

          An algorithm is provided for finding the appropriate sample size to validate scoring systems based on binary logistic regression models. This could be applied to determine the sample size in other similar cases.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: found
          • Article: not found

          The meaning and use of the area under a receiver operating characteristic (ROC) curve.

          A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.
            Bookmark
            • Record: found
            • Abstract: not found
            • Book: not found

            Applied Logistic Regression

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A calibration hierarchy for risk models was defined: from utopia to empirical data.

              Calibrated risk models are vital for valid decision support. We define four levels of calibration and describe implications for model development and external validation of predictions.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                1 May 2017
                2017
                : 12
                : 5
                : e0176726
                Affiliations
                [1 ]Department of Clinical Medicine, Miguel Hernández University, San Juan de Alicante, Alicante, Spain
                [2 ]Department of Pharmacology, Pediatrics and Organic Chemistry, Miguel Hernández University, San Juan de Alicante, Alicante, Spain
                [3 ]Department of Molecular Neurobiology, Neurosciences Institute (Miguel Hernández University and Consejo Superior de Investigaciones Científicas), San Juan de Alicante, Alicante, Spain
                Iranian Institute for Health Sciences Research, ISLAMIC REPUBLIC OF IRAN
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                • Conceptualization: AP DMF EC MTL VFG.

                • Data curation: AP MTL.

                • Formal analysis: AP.

                • Investigation: AP DMF VFG.

                • Methodology: AP DMF VFG.

                • Project administration: AP.

                • Resources: VFG.

                • Software: AP DMF.

                • Supervision: AP.

                • Validation: AP DMF.

                • Visualization: AP DMF EC MTL VFG.

                • Writing – original draft: AP.

                • Writing – review & editing: AP DMF EC MTL VFG.

                Author information
                http://orcid.org/0000-0002-5959-9631
                Article
                PONE-D-16-43577
                10.1371/journal.pone.0176726
                5411086
                28459847
                765f357b-83a3-44ae-8adc-aa0e85d6ee75
                © 2017 Palazón-Bru et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 2 November 2016
                : 14 April 2017
                Page count
                Figures: 3, Tables: 0, Pages: 11
                Funding
                The authors received no specific funding for this work.
                Categories
                Research Article
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Forecasting
                Physical Sciences
                Mathematics
                Statistics (Mathematics)
                Statistical Methods
                Forecasting
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Physical Sciences
                Mathematics
                Statistics (Mathematics)
                Confidence Intervals
                Medicine and Health Sciences
                Health Care
                Health Care Facilities
                Hospitals
                Intensive Care Units
                Physical Sciences
                Mathematics
                Probability Theory
                Random Variables
                Biology and Life Sciences
                Anatomy
                Cardiovascular Anatomy
                Heart
                Medicine and Health Sciences
                Anatomy
                Cardiovascular Anatomy
                Heart
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Mathematical Models
                Biology and Life Sciences
                Biotechnology
                Medical Devices and Equipment
                Medicine and Health Sciences
                Medical Devices and Equipment
                Custom metadata
                All relevant data are within the paper (simulation algorithm).

                Uncategorized
                Uncategorized

                Comments

                Comment on this article