Deep learning predicts hip fracture using confounding patient and healthcare variables

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs, and delayed diagnosis leads to higher cost and worse outcomes. Computer-aided diagnosis (CAD) algorithms have shown promise for helping radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep-learning models on 17,587 radiographs to classify fracture, 5 patient traits, and 14 hospital process variables. All 20 variables could be individually predicted from a radiograph, with the best performances on scanner model (AUC = 1.00), scanner brand (AUC = 0.98), and whether the order was marked “priority” (AUC = 0.79). Fracture was predicted moderately well from the image (AUC = 0.78) and better when combining image features with patient data (AUC = 0.86, DeLong paired AUC comparison, p = 2e-9) or patient data plus hospital process features (AUC = 0.91, p = 1e-21). Fracture prediction on a test set that balanced fracture risk across patient variables was significantly lower than a random test set (AUC = 0.67, DeLong unpaired AUC comparison, p = 0.003); and on a test set with fracture risk balanced across patient and hospital process variables, the model performed randomly (AUC = 0.52, 95% CI 0.46–0.58), indicating that these variables were the main source of the model’s fracture predictions. A single model that directly combines image features, patient, and hospital process data outperforms a Naive Bayes ensemble of an image-only model prediction, patient, and hospital process data. If CAD algorithms are inexplicably leveraging patient and process variables in their predictions, it is unclear how radiologists should interpret their predictions in the context of other known patient data. Further research is needed to illuminate deep-learning decision processes so that computers and clinicians can effectively cooperate.

Related collections

Most cited references 21

Record: found
Abstract: found
Article: found

Is Open Access

Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices

Michael D. Abramoff, Philip Lavin, Michele Birch … (2018)

Artificial Intelligence (AI) has long promised to increase healthcare affordability, quality and accessibility but FDA, until recently, had never authorized an autonomous AI diagnostic system. This pivotal trial of an AI system to detect diabetic retinopathy (DR) in people with diabetes enrolled 900 subjects, with no history of DR at primary care clinics, by comparing to Wisconsin Fundus Photograph Reading Center (FPRC) widefield stereoscopic photography and macular Optical Coherence Tomography (OCT), by FPRC certified photographers, and FPRC grading of Early Treatment Diabetic Retinopathy Study Severity Scale (ETDRS) and Diabetic Macular Edema (DME). More than mild DR (mtmDR) was defined as ETDRS level 35 or higher, and/or DME, in at least one eye. AI system operators underwent a standardized training protocol before study start. Median age was 59 years (range, 22–84 years); among participants, 47.5% of participants were male; 16.1% were Hispanic, 83.3% not Hispanic; 28.6% African American and 63.4% were not; 198 (23.8%) had mtmDR. The AI system exceeded all pre-specified superiority endpoints at sensitivity of 87.2% (95% CI, 81.8–91.2%) (>85%), specificity of 90.7% (95% CI, 88.3–92.7%) (>82.5%), and imageability rate of 96.1% (95% CI, 94.6–97.3%), demonstrating AI’s ability to bring specialty-level diagnostics to primary care settings. Based on these results, FDA authorized the system for use by health care providers to detect more than mild DR and diabetic macular edema, making it, the first FDA authorized autonomous AI diagnostic system in any field of medicine, with the potential to help prevent vision loss in thousands of people with diabetes annually. ClinicalTrials.gov NCT02963441

0 comments Cited 396 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study

John Zech, Marcus Badgeley, Manway Liu … (2018)

Background There is interest in using convolutional neural networks (CNNs) to analyze medical imaging to provide computer-aided diagnosis (CAD). Recent work has suggested that image classification CNNs may not generalize to new data as well as previously believed. We assessed how well CNNs generalized across three hospital systems for a simulated pneumonia screening task. Methods and findings A cross-sectional design with multiple model training cohorts was used to evaluate model generalizability to external sites using split-sample validation. A total of 158,323 chest radiographs were drawn from three institutions: National Institutes of Health Clinical Center (NIH; 112,120 from 30,805 patients), Mount Sinai Hospital (MSH; 42,396 from 12,904 patients), and Indiana University Network for Patient Care (IU; 3,807 from 3,683 patients). These patient populations had an age mean (SD) of 46.9 years (16.6), 63.2 years (16.5), and 49.6 years (17) with a female percentage of 43.5%, 44.8%, and 57.3%, respectively. We assessed individual models using the area under the receiver operating characteristic curve (AUC) for radiographic findings consistent with pneumonia and compared performance on different test sets with DeLong’s test. The prevalence of pneumonia was high enough at MSH (34.2%) relative to NIH and IU (1.2% and 1.0%) that merely sorting by hospital system achieved an AUC of 0.861 (95% CI 0.855–0.866) on the joint MSH–NIH dataset. Models trained on data from either NIH or MSH had equivalent performance on IU (P values 0.580 and 0.273, respectively) and inferior performance on data from each other relative to an internal test set (i.e., new data from within the hospital system used for training data; P values both <0.001). The highest internal performance was achieved by combining training and test data from MSH and NIH (AUC 0.931, 95% CI 0.927–0.936), but this model demonstrated significantly lower external performance at IU (AUC 0.815, 95% CI 0.745–0.885, P = 0.001). To test the effect of pooling data from sites with disparate pneumonia prevalence, we used stratified subsampling to generate MSH–NIH cohorts that only differed in disease prevalence between training data sites. When both training data sites had the same pneumonia prevalence, the model performed consistently on external IU data (P = 0.88). When a 10-fold difference in pneumonia rate was introduced between sites, internal test performance improved compared to the balanced model (10× MSH risk P < 0.001; 10× NIH P = 0.002), but this outperformance failed to generalize to IU (MSH 10× P < 0.001; NIH 10× P = 0.027). CNNs were able to directly detect hospital system of a radiograph for 99.95% NIH (22,050/22,062) and 99.98% MSH (8,386/8,388) radiographs. The primary limitation of our approach and the available public data is that we cannot fully assess what other factors might be contributing to hospital system–specific biases. Conclusion Pneumonia-screening CNNs achieved better internal than external performance in 3 out of 5 natural comparisons. When models were trained on pooled data from sites with different pneumonia prevalence, they performed better on new pooled data from these sites but not on external data. CNNs robustly identified hospital system and department within a hospital, which can have large differences in disease burden and may confound predictions.

0 comments Cited 314 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Automated detection and classification of the proximal humerus fracture by using deep learning algorithm

Seok Chung, Seung Seog Han, Ji Lee … (2018)

Background and purpose — We aimed to evaluate the ability of artificial intelligence (a deep learning algorithm) to detect and classify proximal humerus fractures using plain anteroposterior shoulder radiographs. Patients and methods — 1,891 images (1 image per person) of normal shoulders (n = 515) and 4 proximal humerus fracture types (greater tuberosity, 346; surgical neck, 514; 3-part, 269; 4-part, 247) classified by 3 specialists were evaluated. We trained a deep convolutional neural network (CNN) after augmentation of a training dataset. The ability of the CNN, as measured by top-1 accuracy, area under receiver operating characteristics curve (AUC), sensitivity/specificity, and Youden index, in comparison with humans (28 general physicians, 11 general orthopedists, and 19 orthopedists specialized in the shoulder) to detect and classify proximal humerus fractures was evaluated. Results — The CNN showed a high performance of 96% top-1 accuracy, 1.00 AUC, 0.99/0.97 sensitivity/specificity, and 0.97 Youden index for distinguishing normal shoulders from proximal humerus fractures. In addition, the CNN showed promising results with 65–86% top-1 accuracy, 0.90–0.98 AUC, 0.88/0.83–0.97/0.94 sensitivity/specificity, and 0.71–0.90 Youden index for classifying fracture type. When compared with the human groups, the CNN showed superior performance to that of general physicians and orthopedists, similar performance to orthopedists specialized in the shoulder, and the superior performance of the CNN was more marked in complex 3- and 4-part fractures. Interpretation — The use of artificial intelligence can accurately detect and classify proximal humerus fractures on plain shoulder AP radiographs. Further studies are necessary to determine the feasibility of applying artificial intelligence in the clinic and whether its use could improve care and outcomes compared with current orthopedic assessments.

0 comments Cited 141 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Joel T. Dudley: joel.dudley@mssm.edu

Journal

Journal ID (nlm-ta): NPJ Digit Med

Journal ID (iso-abbrev): NPJ Digit Med

Title: NPJ Digital Medicine

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2398-6352

Publication date (Electronic): 30 April 2019

Publication date PMC-release: 30 April 2019

Publication date Collection: 2019

Volume: 2

Electronic Location Identifier: 31

Affiliations

[1 ]Verily Life Sciences LLC, South San Francisco, CA USA

[2 ]ISNI 0000 0001 0670 2351, GRID grid.59734.3c, Institute for Next Generation Healthcare, , Icahn School of Medicine at Mount Sinai, ; New York, NY USA

[3 ]ISNI 0000 0001 0670 2351, GRID grid.59734.3c, Department of Genetics and Genomic Sciences, , Icahn School of Medicine at Mount Sinai, ; New York, NY USA

[4 ]ISNI 0000000098234542, GRID grid.17866.3e, Department of Medicine, , California Pacific Medical Center, ; San Francisco, CA USA

[5 ]ISNI 0000 0004 1936 7304, GRID grid.1010.0, School of Public Health, , The University of Adelaide, ; Adelaide, South Australia Australia

[6 ]ISNI 0000 0001 2297 6811, GRID grid.266102.1, Bakar Computational Health Sciences Institute, , University of California, ; San Francisco, CA USA

[7 ]ISNI 0000 0004 1936 7304, GRID grid.1010.0, School of Computer Sciences, , The University of Adelaide, ; Adelaide, South Australia Australia

[8 ]ISNI 0000000419368956, GRID grid.168010.e, Division of Cardiovascular Medicine, , Stanford School of Medicine, ; Stanford, CA USA

Author information

Michael V. McConnell http://orcid.org/0000-0001-8743-0932

Article

Publisher ID: 105

DOI: 10.1038/s41746-019-0105-1

PMC ID: 6550136

PubMed ID: 31304378

SO-VID: f88cd8da-a4d6-468b-9314-ca22db320ce7

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 24 October 2018

Date accepted : 5 March 2019

Funding

Funded by: FundRef https://doi.org/10.13039/100000002, U.S. Department of Health & Human Services | National Institutes of Health (NIH);

Award ID: UL1TR001433-01

Award Recipient : Joel T. Dudley

Custom metadata

Keywords: radiography,computer science,statistics

Data availability:

Keywords: radiography, computer science, statistics

Comments

Comment on this article

scite_

Cited by 78

See all cited by

Most referenced authors 708

See all reference authors

- Version 1

Deep learning predicts hip fracture using confounding patient and healthcare variables

Read this article at

Abstract

Related collections

Multivariate statistics in earth science

Most cited references 21

Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study

Automated detection and classification of the proximal humerus fracture by using deep learning algorithm

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 216

Cited by 78

Most referenced authors 708