Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Key Points

Question

Can prediction of patient outcomes in heart failure based on routinely collected claims data be improved with machine learning methods and incorporating linked electronic medical records?

Findings

In this prognostic study including records on 9502 patients, machine learning methods offered only limited improvement over logistic regression in predicting key outcomes in heart failure based on administrative claims. Inclusion of additional predictors from electronic medical records improved prediction for mortality, heart failure hospitalization, and loss in home days but not for high cost.

Meaning

Models based on claims-only predictors may achieve modest discrimination and accuracy in prediction of key patient outcomes in heart failure, and machine learning approaches and incorporation of additional predictors from electronic medical records may offer some improvement in risk prediction of select outcomes.

Abstract

Importance

Accurate risk stratification of patients with heart failure (HF) is critical to deploy targeted interventions aimed at improving patients’ quality of life and outcomes.

Objectives

To compare machine learning approaches with traditional logistic regression in predicting key outcomes in patients with HF and evaluate the added value of augmenting claims-based predictive models with electronic medical record (EMR)–derived information.

Design, Setting, and Participants

A prognostic study with a 1-year follow-up period was conducted including 9502 Medicare-enrolled patients with HF from 2 health care provider networks in Boston, Massachusetts (“providers” includes physicians, clinicians, other health care professionals, and their institutions that comprise the networks). The study was performed from January 1, 2007, to December 31, 2014; data were analyzed from January 1 to December 31, 2018.

Main Outcomes and Measures

All-cause mortality, HF hospitalization, top cost decile, and home days loss greater than 25% were modeled using logistic regression, least absolute shrinkage and selection operation regression, classification and regression trees, random forests, and gradient-boosted modeling (GBM). All models were trained using data from network 1 and tested in network 2. After selecting the most efficient modeling approach based on discrimination, Brier score, and calibration, area under precision-recall curves (AUPRCs) and net benefit estimates from decision curves were calculated to focus on the differences when using claims-only vs claims + EMR predictors.

Results

A total of 9502 patients with HF with a mean (SD) age of 78 (8) years were included: 6113 from network 1 (training set) and 3389 from network 2 (testing set). Gradient-boosted modeling consistently provided the highest discrimination, lowest Brier scores, and good calibration across all 4 outcomes; however, logistic regression had generally similar performance (C statistics for logistic regression based on claims-only predictors: mortality, 0.724; 95% CI, 0.705-0.744; HF hospitalization, 0.707; 95% CI, 0.676-0.737; high cost, 0.734; 95% CI, 0.703-0.764; and home days loss claims only, 0.781; 95% CI, 0.764-0.798; C statistics for GBM: mortality, 0.727; 95% CI, 0.708-0.747; HF hospitalization, 0.745; 95% CI, 0.718-0.772; high cost, 0.733; 95% CI, 0.703-0.763; and home days loss, 0.790; 95% CI, 0.773-0.807). Higher AUPRCs were obtained for claims + EMR vs claims-only GBMs predicting mortality (0.484 vs 0.423), HF hospitalization (0.413 vs 0.403), and home time loss (0.575 vs 0.521) but not cost (0.249 vs 0.252). The net benefit for claims + EMR vs claims-only GBMs was higher at various threshold probabilities for mortality and home time loss outcomes but similar for the other 2 outcomes.

Conclusions and Relevance

Machine learning methods offered only limited improvement over traditional logistic regression in predicting key HF outcomes. Inclusion of additional predictors from EMRs to claims-based models appeared to improve prediction for some, but not all, outcomes.

Abstract

This prognostic study compares several machine learning approaches with traditional logistic regression for development of predictive models for all-cause mortality, heart failure hospitalization, high cost, and loss in home time, among patients with heart failure.

Related collections

Most cited references 27

Record: found
Abstract: not found
Article: not found

Random Forests

Leo Breiman (2001)

0 comments Cited 6867 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach

David DeLong, Elizabeth Delong, Daniel L Clarke-Pearson (1988)

0 comments Cited 2716 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Heart Disease and Stroke Statistics—2018 Update: A Report From the American Heart Association

Emelia Benjamin, Salim S. Virani, Clifton Callaway … (2018)

0 comments Cited 1404 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): JAMA Netw Open

Journal ID (iso-abbrev): JAMA Netw Open

Title: JAMA Network Open

Publisher: American Medical Association

ISSN (Electronic): 2574-3805

Publication date (Electronic): 10 January 2020

Publication date Collection: January 2020

Publication date PMC-release: 10 January 2020

Volume: 3

Issue: 1

Electronic Location Identifier: e1918962

Affiliations

[1 ]Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts

[2 ]Heart and Vascular Center, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts

[3 ]Market Access, Bayer AG, Wuppertal, Germany

Author notes

Article Information

Accepted for Publication: November 14, 2019.

Published: January 10, 2020. doi:10.1001/jamanetworkopen.2019.18962

Corresponding Author: Rishi J. Desai, MS, PhD, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, 1620 Tremont St, Ste 3030-R, Boston, MA 02120 ( rdesai@ 123456bwh.harvard.edu ).

Author Contributions: Dr Desai had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Desai, Evers, Schneeweiss.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Desai.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Desai.

Obtained funding: Desai, Evers, Schneeweiss.

Supervision: Wang, Evers, Schneeweiss.

Conflict of Interest Disclosures: Dr Wang reported receiving grants from Bayer during the conduct of the study, and receiving grants from Novartis, Johnson & Johnson, and Boehringer Ingelheim outside the submitted work. Dr Vaduganathan reported receiving grants from the KL2/Catalyst Medical Research Investigator Training award from Harvard Catalyst and serving on paid advisory boards for Amgen, AstraZeneca, Baxter Healthcare, Bayer AG, Boehringer Ingelheim, and Relypsa outside the submitted work. Dr Evers reported receiving personal fees from Bayer AG during the conduct of the study. Dr Schneeweiss reported receiving grants from Bayer, Boehringer Ingelheim, and Genentech during the conduct of the study; and receiving personal fees from WHISCON LLC and Aetion Co outside the submitted work. No other disclosures were reported.

Funding/Support: This study was supported by an investigator-initiated research grant from Bayer AG.

Role of the Funder/Sponsor: The study was conducted by the authors independent of the sponsor. Dr Evers, who is employed by Bayer, participated in preparation and review of the manuscript but had no role in the decision to submit the manuscript for publication. Funders had no role in design and conduct of the study or in collection, management, analysis, and interpretation of the data.

Article

Publisher ID: zoi190713

DOI: 10.1001/jamanetworkopen.2019.18962

PMC ID: 6991258

PubMed ID: 31922560

SO-VID: bebd2507-d3e7-4420-a318-9cd7eda66bf8

License:

This is an open access article distributed under the terms of the CC-BY-NC-ND License.

History

Date received : 20 August 2019

Date accepted : 14 November 2019

Comments

Comment on this article

scite_

Cited by 82

See all cited by

Most referenced authors 1,226

See all reference authors

- Version 1
- Version 1

Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes

Read this article at

Key Points

Question

Findings

Meaning

Abstract

Importance

Objectives

Design, Setting, and Participants

Main Outcomes and Measures

Results

Conclusions and Relevance

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 27

Random Forests

Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach

Heart Disease and Stroke Statistics—2018 Update: A Report From the American Heart Association

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 35

Cited by 82

Most referenced authors 1,226