13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Diabetes Mellitus Forecasting Using Population Health Data in Ontario, Canada

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Leveraging health administrative data (HAD) datasets for predicting the risk of chronic diseases including diabetes has gained a lot of attention in the machine learning community recently. In this paper, we use the largest health records datasets of patients in Ontario,Canada. Provided by the Institute of Clinical Evaluative Sciences (ICES), this database is age, gender and ethnicity-diverse. The datasets include demographics, lab measurements,drug benefits, healthcare system interactions, ambulatory and hospitalizations records. We perform one of the first large-scale machine learning studies with this data to study the task of predicting diabetes in a range of 1-10 years ahead, which requires no additional screening of individuals.In the best setup, we reach a test AUC of 80.3 with a single-model trained on an observation window of 5 years with a one-year buffer using all datasets. A subset of top 15 features alone (out of a total of 963) could provide a test AUC of 79.1. In this paper, we provide extensive machine learning model performance and feature contribution analysis, which enables us to narrow down to the most important features useful for diabetes forecasting. Examples include chronic conditions such as asthma and hypertension, lab results, diagnostic codes in insurance claims, age and geographical information.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          Predictors of progression from impaired glucose tolerance to NIDDM: an analysis of six prospective studies.

          Risk factors associated with the progression from impaired glucose tolerance (IGT) to NIDDM were examined in data from six prospective studies. IGT and NIDDM were defined in all studies by World Health Organization (WHO) criteria, and baseline risk factors were measured at the time of first recognition of IGT. The studies varied in size from 177 to 693 participants with IGT, and included men and women followed from 2 to 27 years after the recognition of IGT. Across the six studies, the incidence rate of NIDDM was 57.2/1,000 person-years and ranged from 35.8/1,000 to 87.3/1,000 person-years. Although baseline measures of fasting and 2-h postchallenge glucose levels were both positively associated with NIDDM incidence, incidence rates were sharply higher for those in the top quartile of fasting plasma glucose levels, but increased linearly with increasing 2-h postchallenge glucose quartiles. Incidence rates were higher among the Hispanic, Mexican-American, Pima, and Nauruan populations than among Caucasians. The effect of baseline age on NIDDM incidence rates differed among the studies; the rates did not increase or rose only slightly with increasing baseline age in three of the studies and formed an inverted U in three studies. In all studies, estimates of obesity (including BMI, waist-to-hip ratio, and waist circumference) were positively associated with NIDDM incidence. BMI was associated with NIDDM incidence independently of fasting and 2-h post challenge glucose levels in the combined analysis of all six studies and in three cohorts separately, but not in the three studies with the highest NIDDM incidence rates. Sex and family history of diabetes were generally not related to NIDDM progression. This analysis indicates that persons with IGT are at high risk and that further refinement of risk can be made by other simple measurements. The ability to identify persons at high risk of NIDDM should facilitate clinical trials in diabetes prevention.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Inhaled corticosteroids and the risks of diabetes onset and progression.

            Systemic corticosteroids are known to increase diabetes risk, but the effects of high-dose inhaled corticosteroids are unknown. We assessed whether the use and dose of inhaled corticosteroids increase the risk of diabetes onset and progression. We formed a new-user cohort of patients treated for respiratory disease during 1990-2005, identified using the Quebec health insurance databases and followed through 2007 or until diabetes onset. The subcohort treated with oral hypoglycemics was followed until diabetes progression. A nested case-control analysis was used to estimate the rate ratios of diabetes onset and progression associated with current inhaled corticosteroid use, adjusted for age, sex, respiratory disease severity, and co-morbidity. The cohort included 388,584 patients, of whom 30,167 had diabetes onset during 5.5 years of follow-up (incidence rate 14.2/1000/year), and 2099 subsequently progressed from oral hypoglycemic treatment to insulin (incidence rate 19.8/1000/year). Current use of inhaled corticosteroids was associated with a 34% increase in the rate of diabetes (rate ratio [RR] 1.34; 95% confidence interval [CI], 1.29-1.39) and in the rate of diabetes progression (RR 1.34; 95% CI, 1.17-1.53). The risk increases were greatest with the highest inhaled corticosteroid doses, equivalent to fluticasone 1000 μg per day or more (RR 1.64; 95% CI, 1.52-1.76 and RR 1.54; 95% CI, 1.18-2.02; respectively). In patients with respiratory disease, inhaled corticosteroid use is associated with modest increases in the risks of diabetes onset and diabetes progression. The risks are more pronounced at the higher doses currently prescribed in the treatment of chronic obstructive pulmonary disease. Copyright © 2010 Elsevier Inc. All rights reserved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors

              We present a new approach to population health, in which data-driven predictive models are learned for outcomes such as type 2 diabetes. Our approach enables risk assessment from readily available electronic claims data on large populations, without additional screening cost. Proposed model uncovers early and late-stage risk factors. Using administrative claims, pharmacy records, healthcare utilization, and laboratory results of 4.1 million individuals between 2005 and 2009, an initial set of 42,000 variables were derived that together describe the full health status and history of every individual. Machine learning was then used to methodically enhance predictive variable set and fit models predicting onset of type 2 diabetes in 2009-2011, 2010-2012, and 2011-2013. We compared the enhanced model with a parsimonious model consisting of known diabetes risk factors in a real-world environment, where missing values are common and prevalent. Furthermore, we analyzed novel and known risk factors emerging from the model at different age groups at different stages before the onset. Parsimonious model using 21 classic diabetes risk factors resulted in area under ROC curve (AUC) of 0.75 for diabetes prediction within a 2-year window following the baseline. The enhanced model increased the AUC to 0.80, with about 900 variables selected as predictive (p < 0.0001 for differences between AUCs). Similar improvements were observed for models predicting diabetes onset 1-3 years and 2-4 years after baseline. The enhanced model improved positive predictive value by at least 50% and identified novel surrogate risk factors for type 2 diabetes, such as chronic liver disease (odds ratio [OR] 3.71), high alanine aminotransferase (OR 2.26), esophageal reflux (OR 1.85), and history of acute bronchitis (OR 1.45). Liver risk factors emerge later in the process of diabetes development compared with obesity-related factors such as hypertension and high hemoglobin A1c. In conclusion, population-level risk prediction for type 2 diabetes using readily available administrative data is feasible and has better prediction performance than classical diabetes risk prediction algorithms on very large populations with missing data. The new model enables intervention allocation at national scale quickly and accurately and recovers potentially novel risk factors at different stages before the disease onset.
                Bookmark

                Author and article information

                Journal
                08 April 2019
                Article
                1904.04137
                95e4343c-2a2e-4de9-8ea4-bbafd3f198b3

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                18 pages, 3 figures, 8 Tables, Submitted to 2019 ML for Healthcare conference
                stat.AP cs.LG

                Applications,Artificial intelligence
                Applications, Artificial intelligence

                Comments

                Comment on this article