13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          There is interest in using convolutional neural networks (CNNs) to analyze medical imaging to provide computer-aided diagnosis (CAD). Recent work has suggested that image classification CNNs may not generalize to new data as well as previously believed. We assessed how well CNNs generalized across three hospital systems for a simulated pneumonia screening task.

          Methods and findings

          A cross-sectional design with multiple model training cohorts was used to evaluate model generalizability to external sites using split-sample validation. A total of 158,323 chest radiographs were drawn from three institutions: National Institutes of Health Clinical Center (NIH; 112,120 from 30,805 patients), Mount Sinai Hospital (MSH; 42,396 from 12,904 patients), and Indiana University Network for Patient Care (IU; 3,807 from 3,683 patients). These patient populations had an age mean (SD) of 46.9 years (16.6), 63.2 years (16.5), and 49.6 years (17) with a female percentage of 43.5%, 44.8%, and 57.3%, respectively. We assessed individual models using the area under the receiver operating characteristic curve (AUC) for radiographic findings consistent with pneumonia and compared performance on different test sets with DeLong’s test. The prevalence of pneumonia was high enough at MSH (34.2%) relative to NIH and IU (1.2% and 1.0%) that merely sorting by hospital system achieved an AUC of 0.861 (95% CI 0.855–0.866) on the joint MSH–NIH dataset. Models trained on data from either NIH or MSH had equivalent performance on IU ( P values 0.580 and 0.273, respectively) and inferior performance on data from each other relative to an internal test set (i.e., new data from within the hospital system used for training data; P values both <0.001). The highest internal performance was achieved by combining training and test data from MSH and NIH (AUC 0.931, 95% CI 0.927–0.936), but this model demonstrated significantly lower external performance at IU (AUC 0.815, 95% CI 0.745–0.885, P = 0.001). To test the effect of pooling data from sites with disparate pneumonia prevalence, we used stratified subsampling to generate MSH–NIH cohorts that only differed in disease prevalence between training data sites. When both training data sites had the same pneumonia prevalence, the model performed consistently on external IU data ( P = 0.88). When a 10-fold difference in pneumonia rate was introduced between sites, internal test performance improved compared to the balanced model (10× MSH risk P < 0.001; 10× NIH P = 0.002), but this outperformance failed to generalize to IU (MSH 10× P < 0.001; NIH 10× P = 0.027). CNNs were able to directly detect hospital system of a radiograph for 99.95% NIH (22,050/22,062) and 99.98% MSH (8,386/8,388) radiographs. The primary limitation of our approach and the available public data is that we cannot fully assess what other factors might be contributing to hospital system–specific biases.

          Conclusion

          Pneumonia-screening CNNs achieved better internal than external performance in 3 out of 5 natural comparisons. When models were trained on pooled data from sites with different pneumonia prevalence, they performed better on new pooled data from these sites but not on external data. CNNs robustly identified hospital system and department within a hospital, which can have large differences in disease burden and may confound predictions.

          Abstract

          Eric Oermann and colleagues ask whether a DL-based model for pneumonia detection performs well in external validation and consider the effects of hospital system–specific biases.

          Author summary

          Why was this study done?
          • Early results in using convolutional neural networks (CNNs) on X-rays to diagnose disease have been promising, but it has not yet been shown that models trained on X-rays from one hospital or one group of hospitals will work equally well at different hospitals.

          • Before these tools are used for computer-aided diagnosis in real-world clinical settings, we must verify their ability to generalize across a variety of hospital systems.

          What did the researchers do and find?
          • A cross-sectional design was used to train and evaluate pneumonia screening CNNs on 158,323 chest X-rays from the National Institutes of Health Clinical Center (NIH; n = 112,120 from 30,805 patients), Mount Sinai Hospital (42,396 from 12,904 patients), and Indiana University Network for Patient Care ( n = 3,807 from 3,683 patients).

          • In 3 out of 5 natural comparisons, performance on chest X-rays from outside hospitals was significantly lower than on held-out X-rays from the original hospital system.

          • CNNs were able to detect where a radiograph was acquired (hospital system, hospital department) with extremely high accuracy and calibrate predictions accordingly.

          What do these findings mean?
          • The performance of CNNs in diagnosing diseases on X-rays may reflect not only their ability to identify disease-specific imaging findings on X-rays but also their ability to exploit confounding information.

          • Estimates of CNN performance based on test data from hospital systems used for model training may overstate their likely real-world performance.

          Related collections

          Most cited references 3

          • Record: found
          • Abstract: found
          • Article: not found

          Preparing a collection of radiology examinations for distribution and retrieval.

          Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            CONSORT 2010 statement: extension checklist for reporting within person randomised trials

            Evidence shows that the quality of reporting of randomised controlled trials (RCTs) is not optimal. The lack of transparent reporting impedes readers from judging the reliability and validity of trial findings and researchers from extracting information for systematic reviews and results in research waste. The Consolidated Standards of Reporting Trials (CONSORT) statement was developed to improve the reporting of RCTs. Within person trials are used for conditions that can affect two or more body sites, and are a useful and efficient tool because the comparisons between interventions are within people. Such trials are most commonly conducted in ophthalmology, dentistry, and dermatology. The reporting of within person trials has, however, been variable and incomplete, hindering their use in clinical decision making and by future researchers. This document presents the CONSORT extension to within person trials. It aims to facilitate the reporting of these trials. It extends 16 items of the CONSORT 2010 checklist and introduces a modified flowchart and baseline table to enhance transparency. Examples of good reporting and evidence based rationale for CONSORT within person checklist items are provided.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Formal analysisRole: Writing – review & editing
                Role: Data curationRole: Project administrationRole: ResourcesRole: Writing – review & editing
                Role: Data curationRole: Project administrationRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: MethodologyRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: Academic Editor
                Journal
                PLoS Med
                PLoS Med
                plos
                plosmed
                PLoS Medicine
                Public Library of Science (San Francisco, CA USA )
                1549-1277
                1549-1676
                6 November 2018
                November 2018
                : 15
                : 11
                Affiliations
                [1 ] Department of Medicine, California Pacific Medical Center, San Francisco, California, United States of America
                [2 ] Verily Life Sciences, South San Francisco, California, United States of America
                [3 ] Department of Neurological Surgery, Icahn School of Medicine, New York, New York, United States of America
                [4 ] Department of Radiology, Icahn School of Medicine, New York, New York, United States of America
                Edinburgh University, UNITED KINGDOM
                Author notes

                I have read the journal's policy and the authors of this manuscript have the following competing interests: MAB and ML are currently employees at Verily Life Sciences, which played no role in the research and has no commercial interest in it. EKO and ABC receive funding from Intel for unrelated work.

                Article
                PMEDICINE-D-18-01277
                10.1371/journal.pmed.1002683
                6219764
                30399157
                © 2018 Zech et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                Page count
                Figures: 3, Tables: 2, Pages: 17
                Product
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100007277, Icahn School of Medicine at Mount Sinai;
                Award Recipient :
                The Department of Radiology at the Icahn School of Medicine at Mount Sinai ( http://icahn.mssm.edu/about/departments/radiology) supported this project financially via internal department funding (author JJT). No other authors received specific funding for this work. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Medicine and Health Sciences
                Pulmonology
                Pneumonia
                Medicine and Health Sciences
                Radiology and Imaging
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Deep Learning
                People and places
                Geographical locations
                North America
                United States
                Indiana
                Computer and Information Sciences
                Information Technology
                Natural Language Processing
                Medicine and Health Sciences
                Health Care
                Patients
                Inpatients
                Medicine and Health Sciences
                Critical Care and Emergency Medicine
                Medicine and Health Sciences
                Infectious Diseases
                Nosocomial Infections
                Custom metadata
                The code pipeline used to train and test the model across multiple institutions is available at https://github.com/jrzech/cxr-generalize. The NIH ChestX-ray14 dataset was curated and made publicly available by the National Institutes of Health (NIH) Clinical Center ( https://nihcc.app.box.com/v/ChestXray-NIHCC). The Open-I dataset of chest radiographs from the Indiana University Hospital network was curated and made publicly available by the National Library of Medicine, NIH ( https://openi.nlm.nih.gov/faq.php). Retrospective data used in this study from Mount Sinai Health System cannot be released under the terms of our Institutional Review Board approval to protect patient confidentiality. Researchers interested in accessing Mount Sinai data through the Imaging Research Warehouse Initiative may contact Zahi Fayad, PhD at zahi.fayad@ 123456mssm.edu .

                Medicine

                Comments

                Comment on this article