+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Developing a Model to Predict Hospital Encounters for Asthma in Asthmatic Patients: Secondary Analysis

      , DPhil , 1 , , DPhil 2 , , MSc, MD 3 , , MPH, MSc, MD 3 , , MD 3

      (Reviewer), (Reviewer)

      JMIR Medical Informatics

      JMIR Publications

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.



          As a major chronic disease, asthma causes many emergency department (ED) visits and hospitalizations each year. Predictive modeling is a key technology to prospectively identify high-risk asthmatic patients and enroll them in care management for preventive care to reduce future hospital encounters, including inpatient stays and ED visits. However, existing models for predicting hospital encounters in asthmatic patients are inaccurate. Usually, they miss over half of the patients who will incur future hospital encounters and incorrectly classify many others who will not. This makes it difficult to match the limited resources of care management to the patients who will incur future hospital encounters, increasing health care costs and degrading patient outcomes.


          The goal of this study was to develop a more accurate model for predicting hospital encounters in asthmatic patients.


          Secondary analysis of 334,564 data instances from Intermountain Healthcare from 2005 to 2018 was conducted to build a machine learning classification model to predict the hospital encounters for asthma in the following year in asthmatic patients. The patient cohort included all asthmatic patients who resided in Utah or Idaho and visited Intermountain Healthcare facilities during 2005 to 2018. A total of 235 candidate features were considered for model building.


          The model achieved an area under the receiver operating characteristic curve of 0.859 (95% CI 0.846-0.871). When the cutoff threshold for conducting binary classification was set at the top 10.00% (1926/19,256) of asthmatic patients with the highest predicted risk, the model reached an accuracy of 90.31% (17,391/19,256; 95% CI 89.86-90.70), a sensitivity of 53.7% (436/812; 95% CI 50.12-57.18), and a specificity of 91.93% (16,955/18,444; 95% CI 91.54-92.31). To steer future research on this topic, we pinpointed several potential improvements to our model.


          Our model improves the state of the art for predicting hospital encounters for asthma in asthmatic patients. After further refinement, the model could be integrated into a decision support tool to guide asthma care management allocation.

          International Registered Report Identifier (IRRID)


          Related collections

          Most cited references 44

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Risk Prediction Models to Predict Emergency Hospital Admission in Community-dwelling Adults

          In the United States, rehospitalizations alone are estimated to cost €12 billion each year.1 Emergency or unplanned admissions account for approximately 35% of all hospitalizations in the United Kingdom (UK) costing an average of £11 billion annually.2 As a result of this escalating expenditure, reducing emergency admissions is a priority for health care policy-makers.3 For patients, unplanned hospitalizations may be distressing, and older people in particular are at risk of related adverse events such as hospital-acquired infections, loss of functional independence, and falls.4 One way of reducing emergency admissions is to identify people at higher risk who can then be prioritized for an intervention, such as case management.5 Risk prediction models developed for this purpose and not contingent on recent hospitalization have the advantage of broader applicability and can include a wider range of predictor variables. It has also been argued that focusing on specific high-risk groups, such as those discharged from a hospital, may not be the best approach to take in targeting emergency admissions. This is due, in part, to the concept of “regression to the mean,” which means that patients with a history of multiple admissions will on average have fewer admissions in the future than they had in the past.6,7 Three main types of data sources are utilized to derive risk models for predicting emergency admission.3 The first is self-report data collected through patient questionnaire or interview with the advantage of being able to include nonmedical variables such as functional status and social supports. The second is routine data collected for the purposes of administrative databases or population registries. The third incorporates data collated from the clinical record or other primary data sources with the advantage of being able to test larger number of variables and without the response biases associated with self-report. The aim of this study is to perform a systematic review of validated risk prediction models for predicting emergency hospital admission in community-dwelling adults. Specific objectives were: (1) To examine the variables included in risk prediction models; (2) to summarize the performance of risk prediction models in derivation and validation cohorts; and (3) to compare the predictive accuracy of risk models externally validated in the same setting. METHODS The protocol for this systematic review has been published on PROSPERO (PROSPERO2013:CRD42013004390) and is available at http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42013004390. The PRISMA guidelines for the conduct and reporting of systematic reviews were utilized in undertaking this systematic review.8 Search Strategy A systematic literature search was carried out in September 2013 and updated in February 2014 of the following search engines: PubMed, EMBASE, CINAHL, the Cochrane Library, and Google scholar. Additional databases were also searched: the US Agency for Healthcare Research and Quality (AHRQ), the John Hopkins Adjusted Clinical Groupings (ACG) publications, the UK Nuffield Trust, and the King’s fund. The search was supplemented by hand searching references of relevant articles and contacting study authors when necessary. No restrictions were placed on language or year of publication. A combination of MeSH terms and keywords were used to capture studies of interest (Appendix 1, Supplemental Digital Content 1, http://links.lww.com/MLR/A747). Study Selection Studies were included if they met the following criteria: Population: Community-dwelling adults (aged ≥18 y). Risk: Risk prediction models, which were not contingent on an index hospital admission, with a derivation and at least 1 validation (either internal or external) cohort. Models were subdivided according to the data used to develop the model as follows: (i) Self-report; (ii) Administrative or clinical record data. Outcome: Primary outcome of emergency hospital admission (defined as unplanned overnight stay in hospital). Studies that had emergency admission as part of their outcome of interest (e.g. combined endpoints) were also included. Study design: Retrospective or prospective cohort studies. The following studies were excluded: Primary population of interest focused on pediatrics, obstetrics, surgery, mental illness, or patients enrolled in managed care programs; readmission risk prediction models (models contingent on an index hospital admission); models in which the primary outcome of interest was elective hospital admissions, models developed for use in emergency rooms (ERs), for specific diagnoses, for example, congestive heart failure, for a different primary outcome, e.g., mortality and risk adjustment models (models to compare provider performance with inform pay and health care financing). Studies that reported risk factors only and did not develop a model were also excluded. Data Extraction Two reviewers (E.W., E.S.) read the titles and/or abstracts of the identified records in duplicate and eliminated irrelevant studies. Studies that were considered eligible for inclusion were read fully in duplicate and their suitability for inclusion determined. Disagreements were managed by consensus and if consensus could not be reached then by third review (S.M.S.). Additional data were sought from authors when necessary. Data were extracted using a standardized data extraction form. Statistical Analysis Meta-analysis was not possible because of risk prediction model heterogeneity, so we narratively summarized each unique risk prediction model under the following headings: The model’s derivation cohort study setting, participants and population studied. Type of validation cohort, that is, internal or external. Type of data used to derive the model. Model discrimination was assessed using the c statistic with 95% confidence intervals when available. A c statistic of 0.5 indicates that the model performs no better than chance, a score of 0.7–0.8 indicates acceptable discrimination, whereas a score of >0.8 indicates good discrimination.9 In cases in which the c statistic was not presented, we present positive predictive values, sensitivity, and specificity. Variables evaluated and considered for inclusion. Variables included in the final model. Methodological Quality Assessment Methodological quality assessment of included studies was independently performed in duplicate (E.S., N.V.) using the McGinn checklist for the methodological assessment of clinical prediction rule studies10 (Appendix 2, Supplemental Digital Content 2, http://links.lww.com/MLR/A748). The McGinn criteria include a total of 8 criteria to assess the internal and external validity of derivation articles. For validation studies, a total of 5 criteria were used. Detailed guidance notes were also developed in-house to accompany the derivation and validation methodological criteria. Disagreements were solved by consensus or by adjudicating third review (E.W.). RESULTS Study Identification A flow diagram of the search strategy is presented in Figure 1. The electronic databases search strategy yielded 20,666 papers. A further 20 articles were retrieved from searches of other resources. After removal of duplicates, a total of 18,983 articles were screened by title and abstract, of which 163 studies were reviewed in full text; 27 unique risk prediction models met all inclusion criteria. FIGURE 1 PRISMA flow diagram of included risk prediction models. Description of Included Risk Prediction Models Of the 27 unique models included, 11 were developed in the UK, 11 in the US, 3 in Italy, 1 in Spain, and 1 in Canada. Nine models were developed using self-report data or a combination of self-report and administrative or routine data (Table 1) and the remainder (n=18) utilized routine or primary data alone (Table 2). A total of 13 models were developed specifically for use in older people (60 y and above). Total sample sizes ranged from 96 to 4.7 million participants. The majority of models (18 of 27) were developed to predict emergency hospital admission at 12-month follow-up (range, 90 d–4 y). Of these, 3 models focused on emergency admissions for chronic disease or conditions amenable to primary care management as a primary outcome measure.27,31,38 Two models predicted any hospitalization and 2 predicted occupied bed days over specific time periods.17,26,32,38 A further 3 models used the endpoint of emergency admission or ER visit and 2 used combined hospitalization/death.11,19,21,35,37 TABLE 1 Risk Prediction Models Developed Using Self-report Data Primarily (n=9) TABLE 2 Risk Prediction Models Developed Using Administrative or Clinical Record Data (n=18) Data Sources Used to Develop Risk Prediction Models The 9 models developed with self-report data included literature reviews; medical record review and questionnaire pilot in the development of their models (Table 1). Of the 18 models developed using routine or clinical record data, 10 were developed using a combination of administrative and clinical record data.22–24,26–30,35,39 A further 8 were developed using administrative data alone.17,25,32–34,36–38 Eleven models included general practice (GP)/family practice clinical record data in their final model.22–24,26–30,35,39 Risk Prediction Model Variables Each of the variables considered and included in each of the 27 models are presented in Table 3. Seven studies presented their final risk model only and not all variables considered for inclusion, and 1 study uses locally available data to create a risk prediction model specifically for a named population so variables considered for inclusion vary.23–25,27,28,31,34,37 The most frequently included predictor variables in final risk models were: (1) named medical diagnoses (23 models); (2) age (23 models); (3) prior emergency admission (22 models); and (4) sex (18 models). Other health care utilization variables commonly included were prior ER and outpatient department (OPD) visits (14 and 13 models, respectively). Twelve models included measures of multimorbidity (the presence of 2 or more chronic medical conditions in an individual), most commonly the Charlson index and simple disease counts.19,23,24,29–33,36–39 One model considered multimorbidity for inclusion and then excluded it after evaluation.17 Polypharmacy was considered as a predictor variable in 14 models and included in 11 final models.11,18,19,21,23,24,28–30,37,39 Five models included a specific measure of socioeconomic group (SEG) and a further 3 used either employment history or income as proxy measures for SEG.17,21–23,25,28,29,31 TABLE 3 Predictor Variables in Risk Prediction Models (n=26*) for Predicting Emergency Hospital Admissions Overall, a smaller number of models (n=11) included nonmedical factors.11,13–15,17,20–22,24,31,37 These variables were largely included in self-report data models (Table 1). Of those that included functional status as a predictor variable, most considered either activities of daily living, mobility, and/or a history of falls.11,13,17,20–22,24,31 Four questionnaires included measures of self-rated health and 1 included health-related quality of life.13–15,17,18 Two questionnaires included the social support measure of caregiver availability.15,21 Three models developed using administrative or clinical record data included nonmedical variables; these included a history of falls as a predictor variable, social supports and living arrangements, and a disability rating variable respectively.22,31,37 Predictive Accuracy of Risk Prediction Models Eighteen models presented c statistics for the outcome of emergency admission ranging from 0.61 to 0.83. Six models reported c statistics of >0.8, indicating good model discrimination.27,28,31–33,38 Some similarities were noted among these models; all included prior health care utilization variables, multimorbidity or polypharmacy measures, and named medical diagnoses or named prescribed medications variables. Three of these 6 models utilized emergency admissions for chronic disease or conditions amenable to primary care management as a primary outcome measure.27,31,38 A further 7 risk prediction models reported c statistics of between 0.7 and 0.8 representing acceptable model performance.18,22–24,35–37,39 Of 9 models developed using self-report data primarily, 8 were designed for use in older people. In contrast, only 5 of the 18 models developed using administrative or clinical record data were derived specifically for use in older people. The remainder were developed for use in general populations aged over 18 years. Overall, models developed primarily using administrative or clinical record data performed better than those developed using self-report data, with reported c statistics ranging from 0.68 to 0.83 versus 0.61 to 0.74, respectively. Comparison of Performance of Risk Prediction Models Within and Across Populations Three studies developed several prediction models in 1 population, using different datasets and then compared their performance. Billings et al23 developed 4 models in the United Kingdom using: (1) inpatient data alone; (2) combined inpatient and ER data; (3) combined inpatient, ER, and OPD data; and (4) combined inpatient/ER/OPD/GP/family practice data. This was undertaken to determine whether the addition of GP/family practice data improved overall model performance. In the test sample of >1.8 million people, the OPD/ER/GP/inpatient model performed best (c statistic 0.78 vs. 0.73 for inpatient model).23 Similarly, Lemke and colleagues in the United States examined various models using the ACG classification and compared these with models using prior hospitalization only using a data source of 4.7 million medical insurance claims. The model using ACG groupings plus prior health care utilization performed best overall (c statistic 0.8 vs. 0.75).33 Reuben and colleagues compared models developed using prior admission only, self-report data only, and a model using a combination of self-report variables and laboratory values. The model with greatest predictive accuracy used a combination of self-report and laboratory variables (c statistic 0.69).17 Two studies directly compared different validated models in the same population. The UK Combined Predictive Model (CPM) was developed to be nationally representative.30 It was compared with 2 other UK risk models, the Wales predictive model and the Devon predictive model.24,29 In primary care the Wales model was found to have superior predictive ability when compared with the CPM in correctly identifying those who were subsequently admitted. The Devon predictive model included many of the same variables as the CPM but also local data variables and was found to have greater predictive accuracy when compared with the CPM. The authors argued that the addition of local factors, for example, the participant’s duration of family practitioner registration as a proxy for continuity of care, was integral to improved performance. Methodological Quality Assessment of Included Studies Overall, the methodological quality of included studies was good. For derivation, the majority of studies reported all checklist items with the exception of items pertaining to blinding of outcome assessors, blinding of those assessing the presence of predictors, and reporting of the proportion of the population with important predictors. For validation the majority of studies reported all checklist items with underreporting of blinding of those assessing the outcome event (Figs. 2A, B). FIGURE 2 Methodological quality assessment of included risk prediction models (n=26, n=1, model customized depending on the population it is intended for). A, Derivation studies. B, Validation studies. Colour code: Blue: item done and reported; Red: item not done and reported; Green: item unreported. DISCUSSION Summary of Findings This systematic review identified 27 unique risk models for predicting hospital admission. Less than half were developed specifically for older people, with the rest designed for use in an adult population. Overall, models developed using administrative or clinical record data and developed on large datasets tended to have greater predictive ability than self-report questionnaires. Risk prediction models that examined the added benefit of GP/family practice clinical record data in increasing predictive accuracy reported improved performance when this data source was included. Variables Included in Risk Prediction Models Overall, almost all risk models in this review included age, prior hospitalization, and specified medical diagnoses, and the majority included sex. However, less than half considered a specific measurement of multimorbidity, which is surprising considering the impact the presence of multiple conditions has been shown to have on health care utilization.40,41 Similarly, less than half of models considered polypharmacy and only 8 included a measure for SEG in their development. In this review the 6 risk prediction models that demonstrated greatest predictive accuracy (based on reported c statistics) included similar variables, namely, prior health care utilization, multimorbidity or polypharmacy measures, and named medical diagnoses or named prescribed medications predictor variables. Three of the 6 focused on ambulatory care sensitive conditions (ACSCs) admissions. Overall, nonmedical factors such as functional status, social supports, and self-rated health were included in approximately one third of risk models. These factors have been highlighted as potentially contributing to emergency hospitalization. One US study of qualitative interviews with patients identified by a risk prediction model as high risk found that the majority had poor self-rated health, precarious housing status, lived alone, and reported high levels of social isolation.42 Performance of Risk Prediction Models in New Settings In 2 studies a nationally developed risk prediction model was applied to new populations in the same country and its performance compared with adapted models, which included local factors.24,29 In both studies the locally adapted models performed better at predicting future emergency hospitalization. One UK risk score developer designs customized risk models for a specified population using locally available data to ensure that the model created is fit for purpose.27 This approach seems sensible as local factors may well differ within countries and differences in population demographics may mean that a risk model should be applied differently. Comparison With Previous Research To our knowledge this is the first systematic review of risk prediction models for emergency admission in community-dwelling adults. Previous systematic reviews have focused on readmission risk models and risk factors for emergency admission. Kansagara et al43 found that of 26 retrieved readmission risk models only 6 reported a c statistic >0.7. They concluded that most readmission models perform poorly and suggested that the additional variables available through the medical record or patient self-report may improve performance. Our review supports this suggestion with models developed using clinical record data demonstrating improved predictive accuracy overall. García-Pérez et al44 reported that the risk factors of chronic disease status and functional disability were the most important predictors, followed by prior health care utilization. Whereas medical diagnoses and prior health care utilization were included in almost all risk prediction models in this review, far fewer included functional status. This may be related to the type of data available in the development phase, especially those that utilize administrative or clinical record data only. Functional status variables have tended to be included in self-report questionnaires, which may be more prone to response bias for the reporting of other important predictors such as medical diagnoses and previous health care utilization. Future research needs to consider how best to capture nonmedical factors to determine whether their inclusion into predictive models improves performance. Clinical and Research Implications In 2011, a US-based heritage provider group offered a $3 million prize to any group that could develop a risk prediction model to identify people at higher risk for admission so that resources could be directed at reducing their risk.45 However, to date, the evidence for case management for higher-risk community-dwelling people is mixed and has not reduced emergency admissions.46 For instance, the Guided Care model aims to provide primary care that includes comprehensive geriatric assessment, case management, self-management support, and caregiver support provided by a team that includes a specially trained nurse who acts as care coordinator. Patients were targeted using age and multimorbidity as risk stratification criteria. In a 32-center randomized control trial, this intervention was found to improve participants’ chronic care and reduce caregiver strain and resulted in high levels of health care professional satisfaction.47 However, apart from 1 subgroup, compared with usual care, participants utilized similar levels of health care at 20-month follow-up, with the exception of home health care, which was significantly reduced.48 Overall, it is difficult to know whether case management has not achieved anticipated reductions in emergency admissions because of the intervention used or the case finding mechanism utilized. Studies to date have chosen relatively blunt measures of risk stratification to target patients for their respective interventions.48,49 Perhaps intensifying efforts in the choice of model for risk stratification may provide dividends for future studies. Further, focusing case management on interventions that prioritize components relating to multimorbidity and polypharmacy may have a role to play.50 Another consideration relates to the choice of outcome measure. Most risk models in this review used emergency admission for any cause as their primary outcome. Only 3 chose emergency admissions due to ACSCs as an endpoint. A further 3 models considered ACSCs in their development process. This is interesting as a proportion of all emergency admissions will not be preventable even with intensified care.51 ACSCs are chronic conditions for which it is possible to prevent acute exacerbations, therefore reducing the need for hospital admission through management in primary care.52,53 In the United Kingdom, it is estimated that approximately 16% of all emergency admissions for all age groups occur as a result of these conditions and up to 30% of admissions for those aged over 75 years.52 Community-based interventions should target conditions for which upscaling primary care management can really impact on preventing subsequent admissions. In the United States, risk prediction model developers are testing models that aim to focus resources not necessarily on patients at highest risk for emergency admission, but those with conditions or characteristics (such as prior treatment adherence) most likely to benefit from increased preventative care.54 In this way resources can be focused where impact is more likely to be realized. Strengths and Limitations This review is timely considering the increased interest in risk stratification to identify community-dwelling people at higher risk for future admission. However, there are some limitations. Risk prediction models developed in one population or health care setting may not be transferable to another and care must be taken in comparing models. Further, risk prediction models need frequent updating to remain relevant, and some of the older models described in this review are now obsolete. Seven of the included models presented their final risk model only and not all variables considered for inclusion, so the data presented in Table 3 is limited by this. CONCLUSIONS Choosing a robust method of risk stratification is an essential first step in attempting to reduce emergency hospital admissions. This review identified 27 validated risk prediction models developed for use in the community. Local factors and choice of outcome are important considerations in choosing a model. Capturing nonmedical factors may have a role in improving predictive accuracy. Supplementary Material SUPPLEMENTARY MATERIAL
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found


              • Record: found
              • Abstract: found
              • Article: not found

              Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors

              We present a new approach to population health, in which data-driven predictive models are learned for outcomes such as type 2 diabetes. Our approach enables risk assessment from readily available electronic claims data on large populations, without additional screening cost. Proposed model uncovers early and late-stage risk factors. Using administrative claims, pharmacy records, healthcare utilization, and laboratory results of 4.1 million individuals between 2005 and 2009, an initial set of 42,000 variables were derived that together describe the full health status and history of every individual. Machine learning was then used to methodically enhance predictive variable set and fit models predicting onset of type 2 diabetes in 2009-2011, 2010-2012, and 2011-2013. We compared the enhanced model with a parsimonious model consisting of known diabetes risk factors in a real-world environment, where missing values are common and prevalent. Furthermore, we analyzed novel and known risk factors emerging from the model at different age groups at different stages before the onset. Parsimonious model using 21 classic diabetes risk factors resulted in area under ROC curve (AUC) of 0.75 for diabetes prediction within a 2-year window following the baseline. The enhanced model increased the AUC to 0.80, with about 900 variables selected as predictive (p < 0.0001 for differences between AUCs). Similar improvements were observed for models predicting diabetes onset 1-3 years and 2-4 years after baseline. The enhanced model improved positive predictive value by at least 50% and identified novel surrogate risk factors for type 2 diabetes, such as chronic liver disease (odds ratio [OR] 3.71), high alanine aminotransferase (OR 2.26), esophageal reflux (OR 1.85), and history of acute bronchitis (OR 1.45). Liver risk factors emerge later in the process of diabetes development compared with obesity-related factors such as hypertension and high hemoglobin A1c. In conclusion, population-level risk prediction for type 2 diabetes using readily available administrative data is feasible and has better prediction performance than classical diabetes risk prediction algorithms on very large populations with missing data. The new model enables intervention allocation at national scale quickly and accurately and recovers potentially novel risk factors at different stages before the disease onset.

                Author and article information

                JMIR Med Inform
                JMIR Med Inform
                JMIR Medical Informatics
                JMIR Publications (Toronto, Canada )
                January 2020
                21 January 2020
                : 8
                : 1
                [1 ] Department of Biomedical Informatics and Medical Education University of Washington Seattle, WA United States
                [2 ] Care Transformation Intermountain Healthcare Salt Lake City, UT United States
                [3 ] Department of Pediatrics University of Utah Salt Lake City, UT United States
                Author notes
                Corresponding Author: Gang Luo gangluo@ 123456cs.wisc.edu
                ©Gang Luo, Shan He, Bryan L Stone, Flory L Nkoy, Michael D Johnson. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 21.01.2020.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.

                Original Paper
                Original Paper


                Comment on this article