11
views
0
recommends
+1 Recommend
2 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Data Mining and Content Analysis of the Chinese Social Media Platform Weibo During the Early COVID-19 Outbreak: Retrospective Observational Infoveillance Study

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The coronavirus disease (COVID-19) pandemic, which began in Wuhan, China in December 2019, is rapidly spreading worldwide with over 1.9 million cases as of mid-April 2020. Infoveillance approaches using social media can help characterize disease distribution and public knowledge, attitudes, and behaviors critical to the early stages of an outbreak.

          Objective

          The aim of this study is to conduct a quantitative and qualitative assessment of Chinese social media posts originating in Wuhan City on the Chinese microblogging platform Weibo during the early stages of the COVID-19 outbreak.

          Methods

          Chinese-language messages from Wuhan were collected for 39 days between December 23, 2019, and January 30, 2020, on Weibo. For quantitative analysis, the total daily cases of COVID-19 in Wuhan were obtained from the Chinese National Health Commission, and a linear regression model was used to determine if Weibo COVID-19 posts were predictive of the number of cases reported. Qualitative content analysis and an inductive manual coding approach were used to identify parent classifications of news and user-generated COVID-19 topics.

          Results

          A total of 115,299 Weibo posts were collected during the study time frame consisting of an average of 2956 posts per day (minimum 0, maximum 13,587). Quantitative analysis found a positive correlation between the number of Weibo posts and the number of reported cases from Wuhan, with approximately 10 more COVID-19 cases per 40 social media posts ( P<.001). This effect size was also larger than what was observed for the rest of China excluding Hubei Province (where Wuhan is the capital city) and held when comparing the number of Weibo posts to the incidence proportion of cases in Hubei Province. Qualitative analysis of 11,893 posts during the first 21 days of the study period with COVID-19-related posts uncovered four parent classifications including Weibo discussions about the causative agent of the disease, changing epidemiological characteristics of the outbreak, public reaction to outbreak control and response measures, and other topics. Generally, these themes also exhibited public uncertainty and changing knowledge and attitudes about COVID-19, including posts exhibiting both protective and higher-risk behaviors.

          Conclusions

          The results of this study provide initial insight into the origins of the COVID-19 outbreak based on quantitative and qualitative analysis of Chinese social media data at the initial epicenter in Wuhan City. Future studies should continue to explore the utility of social media data to predict COVID-19 disease severity, measure public reaction and behavior, and evaluate effectiveness of outbreak communication.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          A novel coronavirus outbreak of global health concern

          In December, 2019, Wuhan, Hubei province, China, became the centre of an outbreak of pneumonia of unknown cause, which raised intense attention not only within China but internationally. Chinese health authorities did an immediate investigation to characterise and control the disease, including isolation of people suspected to have the disease, close monitoring of contacts, epidemiological and clinical data collection from patients, and development of diagnostic and treatment procedures. By Jan 7, 2020, Chinese scientists had isolated a novel coronavirus (CoV) from patients in Wuhan. The genetic sequence of the 2019 novel coronavirus (2019-nCoV) enabled the rapid development of point-of-care real-time RT-PCR diagnostic tests specific for 2019-nCoV (based on full genome sequence data on the Global Initiative on Sharing All Influenza Data [GISAID] platform). Cases of 2019-nCoV are no longer limited to Wuhan. Nine exported cases of 2019-nCoV infection have been reported in Thailand, Japan, Korea, the USA, Vietnam, and Singapore to date, and further dissemination through air travel is likely.1, 2, 3, 4, 5 As of Jan 23, 2020, confirmed cases were consecutively reported in 32 provinces, municipalities, and special administrative regions in China, including Hong Kong, Macau, and Taiwan. 3 These cases detected outside Wuhan, together with the detection of infection in at least one household cluster—reported by Jasper Fuk-Woo Chan and colleagues 6 in The Lancet—and the recently documented infections in health-care workers caring for patients with 2019-nCoV indicate human-to-human transmission and thus the risk of much wider spread of the disease. As of Jan 23, 2020, a total of 835 cases with laboratory-confirmed 2019-nCoV infection have been detected in China, of whom 25 have died and 93% remain in hospital (figure ). 3 Figure Timeline of early stages of 2019-nCoV outbreak 2019-nCoV=2019 novel coronavirus. In The Lancet, Chaolin Huang and colleagues 7 report clinical features of the first 41 patients admitted to the designated hospital in Wuhan who were confirmed to be infected with 2019-nCoV by Jan 2, 2020. The study findings provide first-hand data about severity of the emerging 2019-nCoV infection. Symptoms resulting from 2019-nCoV infection at the prodromal phase, including fever, dry cough, and malaise, are non-specific. Unlike human coronavirus infections, upper respiratory symptoms are notably infrequent. Intestinal presentations observed with SARS also appear to be uncommon, although two of six cases reported by Chan and colleagues had diarrhoea. 6 Common laboratory findings on admission to hospital include lymphopenia and bilateral ground-glass opacity or consolidation in chest CT scans. These clinical presentations confounded early detection of infected cases, especially against a background of ongoing influenza and circulation of other respiratory viruses. Exposure history to the Huanan Seafood Wholesale market served as an important clue at the early stage, yet its value has decreased as more secondary and tertiary cases have appeared. Of the 41 patients in this cohort, 22 (55%) developed severe dyspnoea and 13 (32%) required admission to an intensive care unit, and six died. 7 Hence, the case-fatality proportion in this cohort is approximately 14·6%, and the overall case fatality proportion appears to be closer to 3% (table ). However, both of these estimates should be treated with great caution because not all patients have concluded their illness (ie, recovered or died) and the true number of infections and full disease spectrum are unknown. Importantly, in emerging viral infection outbreaks the case-fatality ratio is often overestimated in the early stages because case detection is highly biased towards the more severe cases. As further data on the spectrum of mild or asymptomatic infection becomes available, one case of which was documented by Chan and colleagues, 6 the case-fatality ratio is likely to decrease. Nevertheless, the 1918 influenza pandemic is estimated to have had a case-fatality ratio of less than 5% 13 but had an enormous impact due to widespread transmission, so there is no room for complacency. Table Characteristics of patients who have been infected with 2019-nCoV, MERS-CoV, and SARS-CoV7, 8, 10, 11, 12 2019-nCoV * MERS-CoV SARS-CoV Demographic Date December, 2019 June, 2012 November, 2002 Location of first detection Wuhan, China Jeddah, Saudi Arabia Guangdong, China Age, years (range) 49 (21–76) 56 (14–94) 39·9 (1–91) Male:female sex ratio 2·7:1 3·3:1 1:1·25 Confirmed cases 835† 2494 8096 Mortality 25† (2·9%) 858 (37%) 744 (10%) Health-care workers 16‡ 9·8% 23·1% Symptoms Fever 40 (98%) 98% 99–100% Dry cough 31 (76%) 47% 29–75% Dyspnoea 22 (55%) 72% 40–42% Diarrhoea 1 (3%) 26% 20–25% Sore throat 0 21% 13–25% Ventilatory support 9·8% 80% 14–20% Data are n, age (range), or n (%) unless otherwise stated. 2019-nCoV=2019 novel coronavirus. MERS-CoV=Middle East respiratory syndrome coronavirus. SARS-CoV=severe acute respiratory syndrome coronavirus. * Demographics and symptoms for 2019-nCoV infection are based on data from the first 41 patients reported by Chaolin Huang and colleagues (admitted before Jan 2, 2020). 8 Case numbers and mortalities are updated up to Jan 21, 2020) as disclosed by the Chinese Health Commission. † Data as of Jan 23, 2020. ‡ Data as of Jan 21, 2020. 9 As an RNA virus, 2019-nCoV still has the inherent feature of a high mutation rate, although like other coronaviruses the mutation rate might be somewhat lower than other RNA viruses because of its genome-encoded exonuclease. This aspect provides the possibility for this newly introduced zoonotic viral pathogen to adapt to become more efficiently transmitted from person to person and possibly become more virulent. Two previous coronavirus outbreaks had been reported in the 21st century. The clinical features of 2019-nCoV, in comparison with SARS-CoV and Middle East respiratory syndrome (MERS)-CoV, are summarised in the table. The ongoing 2019-nCoV outbreak has undoubtedly caused the memories of the SARS-CoV outbreak starting 17 years ago to resurface in many people. In November, 2002, clusters of pneumonia of unknown cause were reported in Guangdong province, China, now known as the SARS-CoV outbreak. The number of cases of SARS increased substantially in the next year in China and later spread globally, 14 infecting at least 8096 people and causing 774 deaths. 12 The international spread of SARS-CoV in 2003 was attributed to its strong transmission ability under specific circumstances and the insufficient preparedness and implementation of infection control practices. Chinese public health and scientific capabilities have been greatly transformed since 2003. An efficient system is ready for monitoring and responding to infectious disease outbreaks and the 2019-nCoV pneumonia has been quickly added to the Notifiable Communicable Disease List and given the highest priority by Chinese health authorities. The increasing number of cases and widening geographical spread of the disease raise grave concerns about the future trajectory of the outbreak, especially with the Chinese Lunar New Year quickly approaching. Under normal circumstances, an estimated 3 billion trips would be made in the Spring Festival travel rush this year, with 15 million trips happening in Wuhan. The virus might further spread to other places during this festival period and cause epidemics, especially if it has acquired the ability to efficiently transmit from person to person. Consequently, the 2019-nCoV outbreak has led to implementation of extraordinary public health measures to reduce further spread of the virus within China and elsewhere. Although WHO has not recommended any international travelling restrictions so far, 15 the local government in Wuhan announced on Jan 23, 2020, the suspension of public transportation, with closure of airports, railway stations, and highways in the city, to prevent further disease transmission. 16 Further efforts in travel restriction might follow. Active surveillance for new cases and close monitoring of their contacts are being implemented. To improve detection efficiency, front-line clinics, apart from local centres for disease control and prevention, should be armed with validated point-of-care diagnostic kits. Rapid information disclosure is a top priority for disease control and prevention. A daily press release system has been established in China to ensure effective and efficient disclosure of epidemic information. Education campaigns should be launched to promote precautions for travellers, including frequent hand-washing, cough etiquette, and use of personal protection equipment (eg, masks) when visiting public places. Also, the general public should be motivated to report fever and other risk factors for coronavirus infection, including travel history to affected area and close contacts with confirmed or suspected cases. Considering that substantial numbers of patients with SARS and MERS were infected in health-care settings, precautions need to be taken to prevent nosocomial spread of the virus. Unfortunately, 16 health-care workers, some of whom were working in the same ward, have been confirmed to be infected with 2019-nCoV to date, although the routes of transmission and the possible role of so-called super-spreaders remain to be clarified. 9 Epidemiological studies need to be done to assess risk factors for infection in health-care personnel and quantify potential subclinical or asymptomatic infections. Notably, the transmission of SARS-CoV was eventually halted by public health measures including elimination of nosocomial infections. We need to be wary of the current outbreak turning into a sustained epidemic or even a pandemic. The availability of the virus' genetic sequence and initial data on the epidemiology and clinical consequences of the 2019-nCoV infections are only the first steps to understanding the threat posed by this pathogen. Many important questions remain unanswered, including its origin, extent, and duration of transmission in humans, ability to infect other animal hosts, and the spectrum and pathogenesis of human infections. Characterising viral isolates from successive generations of human infections will be key to updating diagnostics and assessing viral evolution. Beyond supportive care, 17 no specific coronavirus antivirals or vaccines of proven efficacy in humans exist, although clinical trials of both are ongoing for MERS-CoV and one controlled trial of ritonavir-boosted lopinavir monotherapy has been launched for 2019-nCoV (ChiCTR2000029308). Future animal model and clinical studies should focus on assessing the effectiveness and safety of promising antiviral drugs, monoclonal and polyclonal neutralising antibody products, and therapeutics directed against immunopathologic host responses. We have to be aware of the challenge and concerns brought by 2019-nCoV to our community. Every effort should be given to understand and control the disease, and the time to act is now. This online publication has been corrected. The corrected version first appeared at thelancet.com on January 29, 2020
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet

            (2009)
            Infodemiology can be defined as the science of distribution and determinants of information in an electronic medium, specifically the Internet, or in a population, with the ultimate aim to inform public health and public policy. Infodemiology data can be collected and analyzed in near real time. Examples for infodemiology applications include: the analysis of queries from Internet search engines to predict disease outbreaks (eg. influenza); monitoring peoples' status updates on microblogs such as Twitter for syndromic surveillance; detecting and quantifying disparities in health information availability; identifying and monitoring of public health relevant publications on the Internet (eg. anti-vaccination sites, but also news articles or expert-curated outbreak reports); automated tools to measure information diffusion and knowledge translation, and tracking the effectiveness of health marketing campaigns. Moreover, analyzing how people search and navigate the Internet for health-related information, as well as how they communicate and share this information, can provide valuable insights into health-related behavior of populations. Seven years after the infodemiology concept was first introduced, this paper revisits the emerging fields of infodemiology and infoveillance and proposes an expanded framework, introducing some basic metrics such as information prevalence, concept occurrence ratios, and information incidence. The framework distinguishes supply-based applications (analyzing what is being published on the Internet, eg. on Web sites, newsgroups, blogs, microblogs and social media) from demand-based methods (search and navigation behavior), and further distinguishes passive from active infoveillance methods. Infodemiology metrics follow population health relevant events or predict them. Thus, these metrics and methods are potentially useful for public health practice and research, and should be further developed and standardized.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak

              Background Surveys are popular methods to measure public perceptions in emergencies but can be costly and time consuming. We suggest and evaluate a complementary “infoveillance” approach using Twitter during the 2009 H1N1 pandemic. Our study aimed to: 1) monitor the use of the terms “H1N1” versus “swine flu” over time; 2) conduct a content analysis of “tweets”; and 3) validate Twitter as a real-time content, sentiment, and public attention trend-tracking tool. Methodology/Principal Findings Between May 1 and December 31, 2009, we archived over 2 million Twitter posts containing keywords “swine flu,” “swineflu,” and/or “H1N1.” using Infovigil, an infoveillance system. Tweets using “H1N1” increased from 8.8% to 40.5% (R 2 = .788; p<.001), indicating a gradual adoption of World Health Organization-recommended terminology. 5,395 tweets were randomly selected from 9 days, 4 weeks apart and coded using a tri-axial coding scheme. To track tweet content and to test the feasibility of automated coding, we created database queries for keywords and correlated these results with manual coding. Content analysis indicated resource-related posts were most commonly shared (52.6%). 4.5% of cases were identified as misinformation. News websites were the most popular sources (23.2%), while government and health agencies were linked only 1.5% of the time. 7/10 automated queries correlated with manual coding. Several Twitter activity peaks coincided with major news stories. Our results correlated well with H1N1 incidence data. Conclusions This study illustrates the potential of using social media to conduct “infodemiology” studies for public health. 2009 H1N1-related tweets were primarily used to disseminate information from credible sources, but were also a source of opinions and experiences. Tweets can be used for real-time content analysis and knowledge translation research, allowing health authorities to respond to public concerns.
                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Public Health Surveill
                JMIR Public Health Surveill
                JPH
                JMIR Public Health and Surveillance
                JMIR Publications (Toronto, Canada )
                2369-2960
                Apr-Jun 2020
                21 April 2020
                : 6
                : 2
                : e18700
                Affiliations
                [1 ] Department of Anesthesiology and Division of Infectious Diseases and Global Public Health University of California San Diego School of Medicine La Jolla, CA United States
                [2 ] S-3 Research LLC San Diego, CA United States
                [3 ] Department of Healthcare Research and Policy University of California San Diego Extension La Jolla, CA United States
                [4 ] Global Health Policy Institute San Diego, CA United States
                Author notes
                Corresponding Author: Tim Mackey tmackey@ 123456ucsd.edu
                Author information
                https://orcid.org/0000-0001-9801-4715
                https://orcid.org/0000-0002-4507-1094
                https://orcid.org/0000-0002-8179-0619
                https://orcid.org/0000-0002-6270-448X
                https://orcid.org/0000-0002-2191-7833
                Article
                v6i2e18700
                10.2196/18700
                7175787
                32293582
                95cf02fd-6a66-4113-b51a-cb7683c3febe
                ©Jiawei Ken Li, Qing Xu, Raphael Cuomo, Vidya Purushothaman, Tim Mackey. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 21.04.2020.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.

                History
                : 12 March 2020
                : 7 April 2020
                : 14 April 2020
                : 14 April 2020
                Categories
                Original Paper
                Original Paper

                covid-19,coronavirus,infectious disease,social media, surveillance,infoveillance,infodemiology

                Comments

                Comment on this article