91
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics.

          Methodology/Principal Findings

          Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99.

          Conclusions/Significance

          Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

          Author Summary

          A variety of obstacles, including bureaucracy and lack of resources, delay detection and reporting of dengue and exist in many countries where the disease is a major public health threat. Surveillance efforts have turned to modern data sources such as Internet usage data. People often seek health-related information online and it has been found that the frequency of, for example, influenza-related web searches as a whole rises as the number of people sick with influenza rises. Tools have been developed to help track influenza epidemics by finding patterns in certain web search activity. However, few have evaluated whether this approach would also be effective for other diseases, especially those that affect many people, that have severe consequences, or for which there is no vaccine. In this study, we found that aggregated, anonymized Google search query data were also capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after a long delay, web search query data is available for analysis within a day. Therefore, because it could potentially provide earlier warnings, these data represent a valuable complement to traditional dengue surveillance.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: not found

          Using internet searches for influenza surveillance.

          The Internet is an important source of health information. Thus, the frequency of Internet searches may provide information regarding infectious disease activity. As an example, we examined the relationship between searches for influenza and actual influenza occurrence. Using search queries from the Yahoo! search engine ( http://search.yahoo.com ) from March 2004 through May 2008, we counted daily unique queries originating in the United States that contained influenza-related search terms. Counts were divided by the total number of searches, and the resulting daily fraction of searches was averaged over the week. We estimated linear models, using searches with 1-10-week lead times as explanatory variables to predict the percentage of cultures positive for influenza and deaths attributable to pneumonia and influenza in the United States. With use of the frequency of searches, our models predicted an increase in cultures positive for influenza 1-3 weeks in advance of when they occurred (P < .001), and similar models predicted an increase in mortality attributable to pneumonia and influenza up to 5 weeks in advance (P < .001). Search-term surveillance may provide an additional tool for disease surveillance.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            More Diseases Tracked by Using Google Trends

            To the Editor: The idea that populations provide data on their influenza status through information-seeking behavior on the Web has been explored in the United States in recent years ( 1 , 2 ). Two reports showed that queries to the Internet search engines Yahoo and Google could be informative for influenza surveillance ( 2 , 3 ). Ginsberg et al. scanned the Google database and found that the sum of the results of 45 queries that most correlated with influenza incidences provided the best predictor of influenza trends ( 3 ). On the basis of trends of Google queries, these authors put their results into practice by creating a Web page dedicated to influenza surveillance. However, they did not develop the same approach for other diseases. To date, no studies have been published about the relationship of search engine query data with other diseases or in languages other than English. We compared search trends based on a list of Google queries related to 3 infectious diseases (influenza-like illness, gastroenteritis, and chickenpox) with clinical surveillance data from the French Sentinel Network ( 4 ). Queries were constructed through team brainstorming. Each participant listed queries likely to be used for searching information about these diseases on the Web. The query time series from January 2004 through February 2009 for France were downloaded from Google Insights for Search, 1 of the 2 websites with Google Trends that enables downloading search trends from the Google database ( 5 ). Correlations with weekly incidence rates (no. cases/100,000 inhabitants) of the 3 diseases provided by the Sentinel Network were calculated for different lag periods (Pearson coefficient ρ). The highest correlation with influenza-like illness was obtained with the query grippe –aviaire –vaccin, the French words for influenza, avian, and vaccine respectively (ρ = 0.82, p 1 of the terms. The second highest correlation was obtained when the keyword gastro (ρ = 0.88, p<0.001) (Appendix Figure, panel B) was used. The highest correlation with chickenpox was obtained with the French word for chickenpox (varicelle) (ρ = 0.78, p<0.001) (Appendix Figure, panel C). A time lag of 0 weeks gave the highest correlations between the best queries for influenza-like illness and acute diarrhea and the incidences of these diseases; the peak of the time series of Google queries occurred at the same time as that of the disease incidences. The best query for chickenpox had a 1-week lag, i.e., was 1 week behind the incidence time series. In conclusion, for each of 3 infectious diseases, 1 well-chosen query was sufficient to provide time series of searches highly correlated with incidence. We have shown the utility of an Internet search engine query data for surveillance of acute diarrhea and chickenpox in a non–English-speaking country. Thus, the ability of Internet search-engine query data to predict influenza in the United States presented by Ginsberg et al. ( 3 ) appears to have a broader application for surveillance of other infectious diseases in other countries. Supplementary Material Appendix Figure Time series of search queries plotted along the incidence of 3 diseases (influenza-like illness, gastroenteritis, and chickenpox), 2004-2008. Black lines show trends of search fractions containing the French words for influenza (A), gastroenteritis (B), and chickenpox (C). Red lines show incidence rates for the 3 corresponding diseases (influenza-like illness, acute diarrhea, and chickenpox). Search fractions are scaled between 0 and 100 by Google Insights for Search's internal processes ( 5 ). Incidence rates are expressed in no. cases for 100,000 inhabitants, as provided by the Sentinel Network ( 4 ).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Web Queries as a Source for Syndromic Surveillance

              In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Negl Trop Dis
                plos
                plosntds
                PLoS Neglected Tropical Diseases
                Public Library of Science (San Francisco, USA )
                1935-2727
                1935-2735
                May 2011
                31 May 2011
                : 5
                : 5
                : e1206
                Affiliations
                [1 ]Children's Hospital Informatics Program, Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Boston, Massachusetts, United States of America
                [2 ]Division of Emergency Medicine, Children's Hospital Boston, Boston, Massachusetts, United States of America
                [3 ]Google Inc., Mountain View, California, United States of America
                [4 ]Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
                Yale School of Public Health, United States of America
                Author notes

                Conceived and designed the experiments: EHC VS CC JSB. Performed the experiments: EHC VS. Analyzed the data: EHC VS CC JSB. Contributed reagents/materials/analysis tools: EHC VS. Wrote the paper: EHC VS CC JSB.

                Article
                PNTD-D-11-00327
                10.1371/journal.pntd.0001206
                3104029
                21647308
                74593949-e69d-4444-91d4-3a0184823a89
                Chan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 12 April 2011
                : 2 May 2011
                Page count
                Pages: 6
                Categories
                Research Article
                Computer Science
                Information Technology
                Medicine
                Epidemiology
                Disease Informatics
                Infectious Disease Epidemiology
                Infectious Diseases
                Neglected Tropical Diseases
                Dengue Fever
                Infectious Disease Modeling
                Public Health

                Infectious disease & Microbiology
                Infectious disease & Microbiology

                Comments

                Comment on this article