Identifying Lung Cancer Risk Factors in the Elderly Using Deep Neural Networks: Quantitative Analysis of Web-Based Survey Data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Lung cancer is one of the most dangerous malignant tumors, with the fastest-growing morbidity and mortality, especially in the elderly. With a rapid growth of the elderly population in recent years, lung cancer prevention and control are increasingly of fundamental importance, but are complicated by the fact that the pathogenesis of lung cancer is a complex process involving a variety of risk factors.

Objective

This study aimed at identifying key risk factors of lung cancer incidence in the elderly and quantitatively analyzing these risk factors’ degree of influence using a deep learning method.

Methods

Based on Web-based survey data, we integrated multidisciplinary risk factors, including behavioral risk factors, disease history factors, environmental factors, and demographic factors, and then preprocessed these integrated data. We trained deep neural network models in a stratified elderly population. We then extracted risk factors of lung cancer in the elderly and conducted quantitative analyses of the degree of influence using the deep neural network models.

Results

The proposed model quantitatively identified risk factors based on 235,673 adults. The proposed deep neural network models of 4 groups (age ≥65 years, women ≥65 years old, men ≥65 years old, and the whole population) achieved good performance in identifying lung cancer risk factors, with accuracy ranging from 0.927 (95% CI 0.223-0.525; P=.002) to 0.962 (95% CI 0.530-0.751; P=.002) and the area under curve ranging from 0.913 (95% CI 0.564-0.803) to 0.931(95% CI 0.499-0.593). Smoking frequency was the leading risk factor for lung cancer in men 65 years and older. Time since quitting and smoking at least 100 cigarettes in their lifetime were the main risk factors for lung cancer in women 65 years and older. Men 65 years and older had the highest lung cancer incidence among the stratified groups, particularly non–small cell lung cancer incidence. Lung cancer incidence decreased more obviously in men than in women with smoking rate decline.

Conclusions

This study demonstrated a quantitative method to identify risk factors of lung cancer in the elderly. The proposed models provided intervention indicators to prevent lung cancer, especially in older men. This approach might be used as a risk factor identification tool to apply in other cancers and help physicians make decisions on cancer prevention.

Related collections

Most cited references 32

Record: found
Abstract: found
Article: found

Is Open Access

Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study

Ahmed Hosny, Chintan Parmar, Thibaud Coroller … (2018)

Background Non-small-cell lung cancer (NSCLC) patients often demonstrate varying clinical courses and outcomes, even within the same tumor stage. This study explores deep learning applications in medical imaging allowing for the automated quantification of radiographic characteristics and potentially improving patient stratification. Methods and findings We performed an integrative analysis on 7 independent datasets across 5 institutions totaling 1,194 NSCLC patients (age median = 68.3 years [range 32.5–93.3], survival median = 1.7 years [range 0.0–11.7]). Using external validation in computed tomography (CT) data, we identified prognostic signatures using a 3D convolutional neural network (CNN) for patients treated with radiotherapy (n = 771, age median = 68.0 years [range 32.5–93.3], survival median = 1.3 years [range 0.0–11.7]). We then employed a transfer learning approach to achieve the same for surgery patients (n = 391, age median = 69.1 years [range 37.2–88.0], survival median = 3.1 years [range 0.0–8.8]). We found that the CNN predictions were significantly associated with 2-year overall survival from the start of respective treatment for radiotherapy (area under the receiver operating characteristic curve [AUC] = 0.70 [95% CI 0.63–0.78], p < 0.001) and surgery (AUC = 0.71 [95% CI 0.60–0.82], p < 0.001) patients. The CNN was also able to significantly stratify patients into low and high mortality risk groups in both the radiotherapy (p < 0.001) and surgery (p = 0.03) datasets. Additionally, the CNN was found to significantly outperform random forest models built on clinical parameters—including age, sex, and tumor node metastasis stage—as well as demonstrate high robustness against test–retest (intraclass correlation coefficient = 0.91) and inter-reader (Spearman’s rank-order correlation = 0.88) variations. To gain a better understanding of the characteristics captured by the CNN, we identified regions with the most contribution towards predictions and highlighted the importance of tumor-surrounding tissue in patient stratification. We also present preliminary findings on the biological basis of the captured phenotypes as being linked to cell cycle and transcriptional processes. Limitations include the retrospective nature of this study as well as the opaque black box nature of deep learning networks. Conclusions Our results provide evidence that deep learning networks may be used for mortality risk stratification based on standard-of-care CT images from NSCLC patients. This evidence motivates future research into better deciphering the clinical and biological basis of deep learning networks as well as validation in prospective data.

0 comments Cited 187 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Is Open Access

Disparities by province, age, and sex in site-specific cancer burden attributable to 23 potentially modifiable risk factors in China: a comparative risk assessment

Wanqing Chen, Changfa Xia, Rongshou Zheng … (2019)

0 comments Cited 103 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Prediction of lung cancer patient survival via supervised machine learning classification techniques

Chip Lynch, Behnaz Abdollahi, Joshua D. Fuqua … (2017)

Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods.

0 comments Cited 83 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Songjing Chen:

ORCID: https://orcid.org/0000-0002-6409-8938

Institute of Medical Information and LibraryChinese Academy of Medical Sciences / Peking Union Medical CollegeNo 3, Yabao Road, Chaoyang DistrictBeijingChina86 01052328761chen.songjing@imicams.ac.cn

Journal

Journal ID (nlm-ta): J Med Internet Res

Journal ID (iso-abbrev): J. Med. Internet Res

Journal ID (publisher-id): JMIR

Title: Journal of Medical Internet Research

Publisher: JMIR Publications (Toronto, Canada )

ISSN (Print): 1439-4456

ISSN (Electronic): 1438-8871

Publication date Collection: March 2020

Publication date (Electronic): 17 March 2020

Volume: 22

Issue: 3

Electronic Location Identifier: e17695

Affiliations

[1 ] Institute of Medical Information and Library Chinese Academy of Medical Sciences / Peking Union Medical College Beijing China

Author notes

Corresponding Author: Songjing Chen chen.songjing@ 123456imicams.ac.cn

Author information

Songjing Chen https://orcid.org/0000-0002-6409-8938

Sizhu Wu https://orcid.org/0000-0002-6758-6259

Article

Publisher ID: v22i3e17695

DOI: 10.2196/17695

PMC ID: 7109611

PubMed ID: 32181751

SO-VID: 27b6750f-669d-4900-b2f0-13de480f3e62

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.

History

Date received : 4 January 2020

Date revision requested : 18 January 2020

Date revision received : 19 January 2020

Date accepted : 22 January 2020

Comments

Comment on this article

scite_

Cited by 11

See all cited by

Most referenced authors 1,548

See all reference authors

Submit your digital health research with an established publisher
- celebrating 25 years of open access