40
views
0
recommends
+1 Recommend
4 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine Learning to Detect Self-Reporting of Symptoms, Testing Access, and Recovery Associated With COVID-19 on Twitter: Retrospective Big Data Infoveillance Study

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The coronavirus disease (COVID-19) pandemic is a global health emergency with over 6 million cases worldwide as of the beginning of June 2020. The pandemic is historic in scope and precedent given its emergence in an increasingly digital era. Importantly, there have been concerns about the accuracy of COVID-19 case counts due to issues such as lack of access to testing and difficulty in measuring recoveries.

          Objective

          The aims of this study were to detect and characterize user-generated conversations that could be associated with COVID-19-related symptoms, experiences with access to testing, and mentions of disease recovery using an unsupervised machine learning approach.

          Methods

          Tweets were collected from the Twitter public streaming application programming interface from March 3-20, 2020, filtered for general COVID-19-related keywords and then further filtered for terms that could be related to COVID-19 symptoms as self-reported by users. Tweets were analyzed using an unsupervised machine learning approach called the biterm topic model (BTM), where groups of tweets containing the same word-related themes were separated into topic clusters that included conversations about symptoms, testing, and recovery. Tweets in these clusters were then extracted and manually annotated for content analysis and assessed for their statistical and geographic characteristics.

          Results

          A total of 4,492,954 tweets were collected that contained terms that could be related to COVID-19 symptoms. After using BTM to identify relevant topic clusters and removing duplicate tweets, we identified a total of 3465 (<1%) tweets that included user-generated conversations about experiences that users associated with possible COVID-19 symptoms and other disease experiences. These tweets were grouped into five main categories including first- and secondhand reports of symptoms, symptom reporting concurrent with lack of testing, discussion of recovery, confirmation of negative COVID-19 diagnosis after receiving testing, and users recalling symptoms and questioning whether they might have been previously infected with COVID-19. The co-occurrence of tweets for these themes was statistically significant for users reporting symptoms with a lack of testing and with a discussion of recovery. A total of 63% (n=1112) of the geotagged tweets were located in the United States.

          Conclusions

          This study used unsupervised machine learning for the purposes of characterizing self-reporting of symptoms, experiences with testing, and mentions of recovery related to COVID-19. Many users reported symptoms they thought were related to COVID-19, but they were not able to get tested to confirm their concerns. In the absence of testing availability and confirmation, accurate case estimations for this period of the outbreak may never be known. Future studies should continue to explore the utility of infoveillance approaches to estimate COVID-19 disease severity.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: not found

          Estimates of the severity of coronavirus disease 2019: a model-based analysis

          Summary Background In the face of rapidly changing data, a range of case fatality ratio estimates for coronavirus disease 2019 (COVID-19) have been produced that differ substantially in magnitude. We aimed to provide robust estimates, accounting for censoring and ascertainment biases. Methods We collected individual-case data for patients who died from COVID-19 in Hubei, mainland China (reported by national and provincial health commissions to Feb 8, 2020), and for cases outside of mainland China (from government or ministry of health websites and media reports for 37 countries, as well as Hong Kong and Macau, until Feb 25, 2020). These individual-case data were used to estimate the time between onset of symptoms and outcome (death or discharge from hospital). We next obtained age-stratified estimates of the case fatality ratio by relating the aggregate distribution of cases to the observed cumulative deaths in China, assuming a constant attack rate by age and adjusting for demography and age-based and location-based under-ascertainment. We also estimated the case fatality ratio from individual line-list data on 1334 cases identified outside of mainland China. Using data on the prevalence of PCR-confirmed cases in international residents repatriated from China, we obtained age-stratified estimates of the infection fatality ratio. Furthermore, data on age-stratified severity in a subset of 3665 cases from China were used to estimate the proportion of infected individuals who are likely to require hospitalisation. Findings Using data on 24 deaths that occurred in mainland China and 165 recoveries outside of China, we estimated the mean duration from onset of symptoms to death to be 17·8 days (95% credible interval [CrI] 16·9–19·2) and to hospital discharge to be 24·7 days (22·9–28·1). In all laboratory confirmed and clinically diagnosed cases from mainland China (n=70 117), we estimated a crude case fatality ratio (adjusted for censoring) of 3·67% (95% CrI 3·56–3·80). However, after further adjusting for demography and under-ascertainment, we obtained a best estimate of the case fatality ratio in China of 1·38% (1·23–1·53), with substantially higher ratios in older age groups (0·32% [0·27–0·38] in those aged <60 years vs 6·4% [5·7–7·2] in those aged ≥60 years), up to 13·4% (11·2–15·9) in those aged 80 years or older. Estimates of case fatality ratio from international cases stratified by age were consistent with those from China (parametric estimate 1·4% [0·4–3·5] in those aged <60 years [n=360] and 4·5% [1·8–11·1] in those aged ≥60 years [n=151]). Our estimated overall infection fatality ratio for China was 0·66% (0·39–1·33), with an increasing profile with age. Similarly, estimates of the proportion of infected individuals likely to be hospitalised increased with age up to a maximum of 18·4% (11·0–7·6) in those aged 80 years or older. Interpretation These early estimates give an indication of the fatality ratio across the spectrum of COVID-19 disease and show a strong age gradient in risk of death. Funding UK Medical Research Council.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing

            The newly emergent human virus SARS-CoV-2 is resulting in high fatality rates and incapacitated health systems. Preventing further transmission is a priority. We analyzed key parameters of epidemic spread to estimate the contribution of different transmission routes and determine requirements for case isolation and contact-tracing needed to stop the epidemic. We conclude that viral spread is too fast to be contained by manual contact tracing, but could be controlled if this process was faster, more efficient and happened at scale. A contact-tracing App which builds a memory of proximity contacts and immediately notifies contacts of positive cases can achieve epidemic control if used by enough people. By targeting recommendations to only those at risk, epidemics could be contained without need for mass quarantines (‘lock-downs’) that are harmful to society. We discuss the ethical requirements for an intervention of this kind.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found

              COVID-19—New Insights on a Rapidly Changing Epidemic

                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Public Health Surveill
                JMIR Public Health Surveill
                JPH
                JMIR Public Health and Surveillance
                JMIR Publications (Toronto, Canada )
                2369-2960
                Apr-Jun 2020
                8 June 2020
                8 June 2020
                : 6
                : 2
                : e19509
                Affiliations
                [1 ] Department of Anesthesiology and Division of Global Public Health and Infectious Diseases School of Medicine University of California San Diego La Jolla, CA United States
                [2 ] Global Health Policy Institute San Diego, CA United States
                [3 ] S-3 Research LLC San Diego, CA United States
                [4 ] Department of Healthcare Research and Policy University of California San Diego San Diego, CA United States
                [5 ] Department of Family Medicine and Public Health School of Medicine University of California San Diego La Jolla, CA United States
                [6 ] Masters Program in Global Health Department of Anthropology University of California San Diego La Jolla, CA United States
                [7 ] Masters Program in Computer Science Jacobs School of Engineering University of California San Diego La Jolla, CA United States
                Author notes
                Corresponding Author: Tim Mackey tmackey@ 123456ucsd.edu
                Author information
                https://orcid.org/0000-0002-2191-7833
                https://orcid.org/0000-0002-6270-448X
                https://orcid.org/0000-0001-9801-4715
                https://orcid.org/0000-0001-8670-6124
                https://orcid.org/0000-0002-0565-8202
                https://orcid.org/0000-0002-2165-7867
                https://orcid.org/0000-0001-7801-5673
                https://orcid.org/0000-0001-6816-5192
                https://orcid.org/0000-0002-8179-0619
                Article
                v6i2e19509
                10.2196/19509
                7282475
                32490846
                b53bd331-e9fd-4668-96b3-ecf6a49c63fb
                ©Tim Mackey, Vidya Purushothaman, Jiawei Li, Neal Shah, Matthew Nali, Cortni Bardier, Bryan Liang, Mingxiang Cai, Raphael Cuomo. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 08.06.2020.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.

                History
                : 21 April 2020
                : 20 May 2020
                : 2 June 2020
                : 3 June 2020
                Categories
                Original Paper
                Original Paper

                infoveillance,covid-19,twitter,machine learning,surveillance

                Comments

                Comment on this article