Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: A retrospective study

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Pneumothorax can precipitate a life-threatening emergency due to lung collapse and respiratory or circulatory distress. Pneumothorax is typically detected on chest X-ray; however, treatment is reliant on timely review of radiographs. Since current imaging volumes may result in long worklists of radiographs awaiting review, an automated method of prioritizing X-rays with pneumothorax may reduce time to treatment. Our objective was to create a large human-annotated dataset of chest X-rays containing pneumothorax and to train deep convolutional networks to screen for potentially emergent moderate or large pneumothorax at the time of image acquisition.

Methods and findings

In all, 13,292 frontal chest X-rays (3,107 with pneumothorax) were visually annotated by radiologists. This dataset was used to train and evaluate multiple network architectures. Images showing large- or moderate-sized pneumothorax were considered positive, and those with trace or no pneumothorax were considered negative. Images showing small pneumothorax were excluded from training. Using an internal validation set ( n = 1,993), we selected the 2 top-performing models; these models were then evaluated on a held-out internal test set based on area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV). The final internal test was performed initially on a subset with small pneumothorax excluded (as in training; n = 1,701), then on the full test set ( n = 1,990), with small pneumothorax included as positive. External evaluation was performed using the National Institutes of Health (NIH) ChestX-ray14 set, a public dataset labeled for chest pathology based on text reports. All images labeled with pneumothorax were considered positive, because the NIH set does not classify pneumothorax by size. In internal testing, our “high sensitivity model” produced a sensitivity of 0.84 (95% CI 0.78–0.90), specificity of 0.90 (95% CI 0.89–0.92), and AUC of 0.94 for the test subset with small pneumothorax excluded. Our “high specificity model” showed sensitivity of 0.80 (95% CI 0.72–0.86), specificity of 0.97 (95% CI 0.96–0.98), and AUC of 0.96 for this set. PPVs were 0.45 (95% CI 0.39–0.51) and 0.71 (95% CI 0.63–0.77), respectively. Internal testing on the full set showed expected decreased performance (sensitivity 0.55, specificity 0.90, and AUC 0.82 for high sensitivity model and sensitivity 0.45, specificity 0.97, and AUC 0.86 for high specificity model). External testing using the NIH dataset showed some further performance decline (sensitivity 0.28–0.49, specificity 0.85–0.97, and AUC 0.75 for both). Due to labeling differences between internal and external datasets, these findings represent a preliminary step towards external validation.

Conclusions

We trained automated classifiers to detect moderate and large pneumothorax in frontal chest X-rays at high levels of performance on held-out test data. These models may provide a high specificity screening solution to detect moderate or large pneumothorax on images collected when human review might be delayed, such as overnight. They are not intended for unsupervised diagnosis of all pneumothoraces, as many small pneumothoraces (and some larger ones) are not detected by the algorithm. Implementation studies are warranted to develop appropriate, effective clinician alerts for the potentially critical finding of pneumothorax, and to assess their impact on reducing time to treatment.

Author summary

Why was this study done?

Pneumothorax (collapse of the lung due to air in the chest) can be a life-threatening emergency.
Delays in identifying and treating serious pneumothorax can result in severe harm to patients, including death.
Pneumothorax is often detected by chest X-ray, but delays in review of these images (particularly at hours of lower staffing, such as overnight) can lead to delay in diagnosis and treatment.
Prioritization of images that are suspected to show a pneumothorax for rapid review may result in earlier treatment of pneumothorax.

What did the researchers do and find?

We developed computer algorithms that scan chest X-rays and flag images that are suspicious for containing a moderate or large pneumothorax.
These algorithms “learned” to identify moderate- and large-sized pneumothorax by training on a large set of both positive and negative chest X-rays.
We created the training set of images by asking board-certified radiologists to label each image for the presence or absence of pneumothorax, as well as their estimate of pneumothorax size.
After training, we tested the performance of the algorithms on a similar collection of labeled X-rays that had never been seen by the algorithms and analyzed their success at detecting images showing pneumothorax, without any human guidance.
We found that our algorithms were able to detect the majority (80%–84%) of images showing a moderate or large pneumothorax, while correctly categorizing 90% or more of images without pneumothorax as “negative.” When we included small pneumothoraces in our test set, performance declined, as expected because the algorithms had not been trained on images with small pneumothoraces.
When testing our algorithms using images acquired outside our hospital, performance declined compared with our internal testing. However, the tests of the external dataset were not exactly comparable to our internal tests: small pneumothoraces could not be excluded from the evaluation because labels in the external dataset did not include size, and labels were assigned by computer interpretation of clinical reports rather than radiologists reevaluating the images, limiting the accuracy of the labels.

What do these findings mean?

Computer algorithms, given enough high-quality training data, are capable of detecting pneumothorax on a chest X-ray with sufficient accuracy to help prioritize images for rapid review by physicians.
Algorithms like these could potentially be used by radiologists as a tool to increase the speed with which a serious pneumothorax is detected, even at times of lower staffing, when turnaround times are typically longer.
Rapid detection and communication with treating physicians may result in faster treatment of pneumothorax, potentially reducing the harm of a serious medical problem.
The transferability of our models to clinical settings outside the institution where the training images were acquired needs further validation. Although we evaluated the models against an external dataset, differences in the composition, curation, and labeling between the external data and our own make it difficult to interpret these external dataset results.

Related collections

Most cited references 17

Record: found
Abstract: not found
Conference Proceedings: not found

Very deep convolutional networks for large-scale image recognition

K Simonyan, A. Zisserman, A ZISSERMAN … (2015)

0 comments Cited 155 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: not found

Training and Validating a Deep Convolutional Neural Network for Computer-Aided Detection and Classification of Abnormalities on Frontal Chest Radiographs.

Mark Cicero, Alexander Bilbily, Errol Colak … (2017)

Convolutional neural networks (CNNs) are a subtype of artificial neural network that have shown strong performance in computer vision tasks including image classification. To date, there has been limited application of CNNs to chest radiographs, the most frequently performed medical imaging study. We hypothesize CNNs can learn to classify frontal chest radiographs according to common findings from a sufficiently large data set.

0 comments Cited 89 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Deep Residual Learning for Image Recognition

Shaoqing Ren, Xiangyu Zhang, Kaiming He … (2015)

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

0 comments Cited 81 times – based on 0 reviews

Preprint

     Review now

Bookmark

All references

Author and article information

Contributors

Andrew G. Taylor:

ORCID: http://orcid.org/0000-0002-4198-4232

Role: ConceptualizationRole: Data curationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: ResourcesRole: SoftwareRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing

Clinton Mielke: Role: InvestigationRole: MethodologyRole: SoftwareRole: Writing – original draft

John Mongan:

ORCID: http://orcid.org/0000-0003-2765-7451

Suchi Saria: Role: Academic Editor

Journal

Journal ID (nlm-ta): PLoS Med

Journal ID (iso-abbrev): PLoS Med

Journal ID (publisher-id): plos

Journal ID (pmc): plosmed

Title: PLoS Medicine

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Print): 1549-1277

ISSN (Electronic): 1549-1676

Publication date (Electronic): 20 November 2018

Publication date Collection: November 2018

Volume: 15

Issue: 11

Electronic Location Identifier: e1002697

Affiliations

[1 ] Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, California, United States of America

[2 ] Center for Digital Health Innovation, University of California, San Francisco, San Francisco, California, United States of America

Johns Hopkins University, UNITED STATES

Author notes

I have read the journal's policy and the authors of this manuscript have the following competing interests: AT and JM report research support from GE Healthcare for this project. GE Healthcare has the option to license technologies created from this work for incorporation into commercial products. JM also reports research support (unrelated to this project) from Enlitic, Inc. He is also a participant in the Powerscribe Innovators Program with Nuance, Inc. AT and JM have a potential royalty interest in commercial products derived from this work. The authors do not hold stock in GE (other than in the form of broad-based mutual funds) and do not have any other direct or competing interest in the company.

* E-mail: andrew.taylor@ 123456ucsf.edu

Author information

Andrew G. Taylor http://orcid.org/0000-0002-4198-4232

John Mongan http://orcid.org/0000-0003-2765-7451

Article

Publisher ID: PMEDICINE-D-18-01819

DOI: 10.1371/journal.pmed.1002697

PMC ID: 6245672

PubMed ID: 30457991

SO-VID: 9dc66821-74cd-4e40-a560-096d0af70764

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 24 May 2018

Date accepted : 19 October 2018

Page count

Figures: 2, Tables: 5, Pages: 15

Funding

Funded by: funder-id http://dx.doi.org/10.13039/100004313, General Electric;

Award ID: A128218

Research Support provided by a grant from GE Healthcare (A128218). The funding agency had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Custom metadata

Data Availability Details regarding model selection and hyperparameter optimization are contained within the text of the manuscript. Source code for model construction as well as model training are included with the manuscript as Python code within two appendices. The original source X-rays are not currently publicly available due to patient privacy/HIPAA concerns as well as the size of the data repository. However, the Center for Digital Health Innovation at UCSF will serve as a point of contact for interested researchers to discuss a mechanism for data access and review. They can be contacted at: cdhi@ 123456ucsf.edu .

Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: A retrospective study

Read this article at

Abstract

Background

Methods and findings

Conclusions

Author summary

Why was this study done?

What did the researchers do and find?

What do these findings mean?

Related collections

REPO-TRIAL

Most cited references 17

Very deep convolutional networks for large-scale image recognition

Training and Validating a Deep Convolutional Neural Network for Computer-Aided Detection and Classification of Abnormalities on Frontal Chest Radiographs.

Deep Residual Learning for Image Recognition

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 191

Cited by 62

Most referenced authors 293