The diagnosis of chronic obstructive pulmonary disease (COPD) is challenging, especially in the primary institution which lacks spirometer. To reduce the rate of COPD missed diagnoses in Northeast China, which has a higher prevalence of COPD, this study aimed to establish efficient primary screening and discriminant models of COPD in this region.
Subjects from Northeast China were enrolled from December 2017 to April 2019 from The First Hospital of China Medical University. Pulmonary function tests and questionnaire were given to all participants. Using illness or no illness as the goal for screening models and disease severity as the goal for discriminant models, multivariate linear regression, logical regression, linear discriminant analysis, K-nearest neighbor, decision tree and support vector machine were constructed through R language and Python software. After comparing effectiveness among them, the most optimal primary screening and discriminant models were established.
Enrolled were 232 COPD patients (124 GOLD I–II and 108 GOLD III–IV) and 218 normal controls. Eight primary screening models were established. The optimal model was Y = −1.2562–0.3891X 4 (education level) + 1.7996X 5 (dyspnea) + 0.5102X 6 (cooking fuel grade) + 1.498X 7 (smoking index) + 0.8077X 9 (family history)-0.5552X 11 (BMI) + 0.538X 13 (cough with sputum) + 2.0328X 14 (wheezing) + 1.3378X 16 (farmers) + 0.8187X 17 (mother’s smoking exposure history during pregnancy)-0.389X 18 (kitchen ventilation) + 0.6888X 19 (childhood heating). Six discriminant models were established. The optimal model was decision tree (the optimal variables: dyspnea (x 5), cooking fuel grade (x 6), second-hand smoking index (x 8), BMI (x 11), cough (x 12), cough with sputum (x 13), wheezing (x 14), farmer (x 16), kitchen ventilation (x 18), and childhood heating (x 19)). The code was established to combine the discriminant model with computer technology.