5
views
0
recommends
+1 Recommend
2 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The SARS-COV-2 virus and its variants pose extraordinary challenges for public health worldwide. Timely and accurate forecasting of the COVID-19 epidemic is key to sustaining interventions and policies and efficient resource allocation. Internet-based data sources have shown great potential to supplement traditional infectious disease surveillance, and the combination of different Internet-based data sources has shown greater power to enhance epidemic forecasting accuracy than using a single Internet-based data source. However, existing methods incorporating multiple Internet-based data sources only used real-time data from these sources as exogenous inputs but did not take all the historical data into account. Moreover, the predictive power of different Internet-based data sources in providing early warning for COVID-19 outbreaks has not been fully explored.

          Objective

          The main aim of our study is to explore whether combining real-time and historical data from multiple Internet-based sources could improve the COVID-19 forecasting accuracy over the existing baseline models. A secondary aim is to explore the COVID-19 forecasting timeliness based on different Internet-based data sources.

          Methods

          We first used core terms and symptom-related keyword-based methods to extract COVID-19–related Internet-based data from December 21, 2019, to February 29, 2020. The Internet-based data we explored included 90,493,912 online news articles, 37,401,900 microblogs, and all the Baidu search query data during that period. We then proposed an autoregressive model with exogenous inputs, incorporating real-time and historical data from multiple Internet-based sources. Our proposed model was compared with baseline models, and all the models were tested during the first wave of COVID-19 epidemics in Hubei province and the rest of mainland China separately. We also used lagged Pearson correlations for COVID-19 forecasting timeliness analysis.

          Results

          Our proposed model achieved the highest accuracy in all 5 accuracy measures, compared with all the baseline models of both Hubei province and the rest of mainland China. In mainland China, except for Hubei, the COVID-19 epidemic forecasting accuracy differences between our proposed model (model i) and all the other baseline models were statistically significant (model 1, t 198=–8.722, P<.001; model 2, t 198=–5.000, P<.001, model 3, t 198=–1.882, P=.06; model 4, t 198=–4.644, P<.001; model 5, t 198=–4.488, P<.001). In Hubei province, our proposed model's forecasting accuracy improved significantly compared with the baseline model using historical new confirmed COVID-19 case counts only (model 1, t 198=–1.732, P=.09). Our results also showed that Internet-based sources could provide a 2- to 6-day earlier warning for COVID-19 outbreaks.

          Conclusions

          Our approach incorporating real-time and historical data from multiple Internet-based sources could improve forecasting accuracy for epidemics of COVID-19 and its variants, which may help improve public health agencies' interventions and resource allocation in mitigating and controlling new waves of COVID-19 or other relevant epidemics.

          Related collections

          Most cited references54

          • Record: found
          • Abstract: not found
          • Article: not found

          Building Predictive Models inRUsing thecaretPackage

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A Caution Regarding Rules of Thumb for Variance Inflation Factors

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Detecting influenza epidemics using search engine query data.

              Seasonal influenza epidemics are a major public health concern, causing tens of millions of respiratory illnesses and 250,000 to 500,000 deaths worldwide each year. In addition to seasonal influenza, a new strain of influenza virus against which no previous immunity exists and that demonstrates human-to-human transmission could result in a pandemic with millions of fatalities. Early detection of disease activity, when followed by a rapid response, can reduce the impact of both seasonal and pandemic influenza. One way to improve early detection is to monitor health-seeking behaviour in the form of queries to online search engines, which are submitted by millions of users around the world each day. Here we present a method of analysing large numbers of Google search queries to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day. This approach may make it possible to use search queries to detect influenza epidemics in areas with a large population of web search users.
                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Public Health Surveill
                JMIR Public Health Surveill
                JPH
                JMIR Public Health and Surveillance
                JMIR Publications (Toronto, Canada )
                2369-2960
                June 2022
                16 June 2022
                16 June 2022
                : 8
                : 6
                : e35266
                Affiliations
                [1 ] School of Management Xi’an Jiaotong University Xi'an China
                [2 ] Department of Information Systems City University of Hong Kong Hong Kong China
                [3 ] National Center for Applied Mathematics Shenzhen Shenzhen China
                [4 ] College of Business Southern University of Science and Technology Shenzhen China
                [5 ] Department of Information Systems and Intelligent Business School of Management Xi’an Jiaotong University Xi'an China
                [6 ] College of Public Health University of Georgia Athens, GA United States
                [7 ] School of Economics University of Nottingham Ningbo China Ningbo China
                [8 ] School of Medicine and Health Management Tongji Medical College Huazhong University of Science and Technology Wuhan China
                Author notes
                Corresponding Author: Wei Huang waynehuangwei@ 123456163.com
                Author information
                https://orcid.org/0000-0002-6129-2751
                https://orcid.org/0000-0001-7150-0844
                https://orcid.org/0000-0002-9778-9196
                https://orcid.org/0000-0002-5351-3489
                https://orcid.org/0000-0002-2025-3123
                https://orcid.org/0000-0002-4373-8000
                Article
                v8i6e35266
                10.2196/35266
                9205424
                35507921
                6fec59ae-3ff7-4dd6-9680-e98ee955c052
                ©Jingwei Li, Wei Huang, Choon Ling Sia, Zhuo Chen, Tailai Wu, Qingnan Wang. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 16.06.2022.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.

                History
                : 29 November 2021
                : 22 January 2022
                : 12 February 2022
                : 3 May 2022
                Categories
                Original Paper
                Original Paper

                sars-cov-2,covid 19,epidemic forecasting,disease surveillance,infectious disease epidemiology,social medial,online news,search query,autoregression model

                Comments

                Comment on this article