In Japan, challenges were reported in accurately estimating the prevalence of schizophrenia among the general population. Retrieving previous studies, we investigated that patients with schizophrenia were more likely to experience poor subjective well-being and various physical, psychiatric, and social comorbidities. These factors might have great potential for precisely classifying schizophrenia cases in order to estimate the prevalence. Machine learning has shown a positive impact on many fields, including epidemiology, due to its high-precision modeling capability. It has been applied in research on mental disorders. However, few studies have applied machine learning technology to the precise classification of schizophrenia cases by variables of demographic and health-related backgrounds, especially using large-scale web-based surveys.
The aim of the study is to construct an artificial neural network (ANN) model that can accurately classify schizophrenia cases from large-scale Japanese web-based survey data and to verify the generalizability of the model.
Data were obtained from a large Japanese internet research pooled panel (Rakuten Insight, Inc) in 2021. A total of 223 individuals, aged 20-75 years, having schizophrenia, and 1776 healthy controls were included. Answers to the questions in a web-based survey were formatted as 1 response variable (self-report diagnosed with schizophrenia) and multiple feature variables (demographic, health-related backgrounds, physical comorbidities, psychiatric comorbidities, and social comorbidities). An ANN was applied to construct a model for classifying schizophrenia cases. Logistic regression (LR) was used as a reference. The performances of the models and algorithms were then compared.
The model trained by the ANN performed better than LR in terms of area under the receiver operating characteristic curve (0.86 vs 0.78), accuracy (0.93 vs 0.91), and specificity (0.96 vs 0.94), while the model trained by LR showed better sensitivity (0.63 vs 0.56). Comparing the performances of the ANN and LR, the ANN was better in terms of area under the receiver operating characteristic curve (bootstrapping: 0.847 vs 0.773 and cross-validation: 0.81 vs 0.72), while LR performed better in terms of accuracy (0.894 vs 0.856). Sleep medication use, age, household income, and employment type were the top 4 variables in terms of importance.
This study constructed an ANN model to classify schizophrenia cases using web-based survey data. Our model showed a high internal validity. The findings are expected to provide evidence for estimating the prevalence of schizophrenia in the Japanese population and informing future epidemiological studies.