Given the rapidly changing nature of the coronavirus disease 2019 (COVID-19) pandemic,
real-time monitoring of COVID-19 cases and deaths has been widely embraced.
The pandemic has also been accompanied by an “infodemic,” an overabundance of information
Public response to the pandemic and infodemic is important, but undermeasured.
Real-time analysis of public response could lead to earlier recognition of changing
public priorities, fluctuations in wellness, and uptake of public health measures,
all of which carry implications for individual- and population-level health.
To test this hypothesis, we measured daily changes in the frequency of topics of discussion
across 94,467 COVID-19-related comments on an online public forum in March, 2020.
Reddit is the 19th most popular website in the world with 420 million monthly active
Between March 3 and March 31, 2020, we obtained all comments from the “Daily Discussion
Post” on “r/Coronavirus,” the most popular COVID-19 subreddit with 1.9 million members.
We defined 50 discussion topics, groups of commonly co-occurring words, using a machine
learning based approach to natural language processing, latent Dirichlet allocation
For each of the 50 topics, we reviewed the ten words and comments most associated
with each topic.
We identified topics that fell into three categories of interest: response to public
health measures, impact on daily life, and sense of pandemic severity. We tracked
daily variations in the average prevalence of topics across all comments. In order
to improve visualization of patterns of topic change, we used locally estimated scatterplot
smoothing (LOESS) lines. To quantify the degree of change in prevalence, we compared
4-day periods using the two-proportion z-test. We used R version 3.6.1 for all analyses.
All data was publicly available, and the study was considered exempt under University
of Pennsylvania Institutional Review Board guidelines.
In the 29 days between March 3 and March 31, we collected 94,467 posts from r/Coronavirus
daily discussion threads, with peak activity between March 15 and 17 (16% of comments).
Of the 50 LDA topics (available by request), ten pertained to the three categories
of interest. Other topics included those related to news sharing, political discussions,
and discussions about the science of COVID-19. Table 1 shows key topic words and representative
comments, and Figure 1 displays the change in topic frequency over time by category.
In the “public health measures” category, for instance, “hand washing” became less
prevalent throughout March (2.7% from March 3 to March 6 vs 1.9% from March 28 to
March 31, p < .001; two-proportion z-test). “Impact on daily life” topics showed “travel”
peaking early and dropping throughout the month (3.2% March 3–March 6 vs 1.0% March
28–March 31, p < .001) and concern regarding “personal finances” increasing (1.5%
March 3–March 6 vs 2.1% March 28–March 31, p = .003). “Sense of pandemic severity”
evolved over the month, with fewer comments comparing COVID-19 with the flu (2.3%
March 3–March 6 vs 1.8% March 28–March 31, p = .04) and mid- to late-month growth
in comments reporting numbers of cases and deaths (2.1% March 12–March 15 vs 2.7%
March 28–March 31, p = .001).
Latent Dirichlet Allocation Topics from a Coronavirus Subreddit Throughout March,
2020, with a Collection of Top Words Used to Define the Topic and a Redacted Representative
Redacted representative Reddit comment (to preserve user anonymity)
Public health measures
hands, wash, touch, use, water, soap
“At least get them to wash hands as soon as they get back and wash clothes”
stay, people, away, home, outside, safe
“It’s okay to go for a walk, just try to stay at least 6 feet from others.”
masks, wear, face, n95, use, make
“What type of filter to insert in a cotton mask? Ordering some cotton masks with an
insert to add a filter. Would an air conditioner filter work?”
Daily life impact
Food and supplies
food, grocery, people, store, toilet, buy
“Just went to my local grocery store this morning. The place was packed with folks…
saw a ton of people buying paper towels, toilet paper etc.that aisle was almost empty.”
travel, back, trip, US, flight, cancel
“Going to a wedding in Canada next month. What are the odds travel is banned between
the last weeks of April?”
school, closed, still, public, kids, university
“Gov has closed all K-12 schools in [state] starting Monday until early April.”
work, get, pay, money, need, help
“My work just closed until further notice. I work in food service industry. What are
my options for govenrment financial assistance? I do not have paid sick leave or paid
Sense of pandemic severity
Number of cases and deaths
cases, number, deaths, new, confirmed
“So if these numbers are correct, US is now third in total cases behind China and
Italy, and FIRST in new cases, surpassing Italy. And we are supposed to be ~10 days
Comparison to flu
flu, like, coronavirus, much, bad, worse
“There is no way this virus is as bad as people are saying it is. Do not about 61,000
people die every year from flu?”
Danger to elderly
rate, death, mortality, age, higher, risk
“The case fatality rate in Italy was 1.0%, but with a much more elderly population,
in which coronavirus death rate is much higher”
The change in the prevalence over the month of March, 2020, in Reddit comment content
related to a public health measures, b daily life impact, and c sense of pandemic
severity. Lines show locally estimated scatterplot smoothing (LOESS) for the daily
average prevalence of the topic across all comments; shaded grey area represents the
standard error of the LOESS estimation.
This analysis indicates that longitudinal topic modeling of Reddit content is effective
in identifying patterns of public dialogue and could be used to guide targeted interventions.
For instance, comparisons to the flu were embraced by the public. Early recognition
of this reality could have led to more specific information dissemination campaigns
and earlier public acknowledgement of disease severity. Questions about safely spending
time outdoors peaked in mid-March, representing a missed opportunity for public guidance.
Tracking and responding proactively to common questions, such as what material is
best used for a homemade mask, may minimize the spread of misinformation. Notably
missing from these Reddit topics were discussions of contact tracing, a growing area
of public concern. Limitations of this study include that Reddit users are not representative
of all segments of the population, and that Reddit data is not associated with a geographic
location. Real-time monitoring of online COVID-19 dialogue holds promise for more
dynamically understanding and responding to needs in public health emergencies.