Since it was first reported by WHO in Jan 5, 2020, over 80 000 cases of a novel coronavirus
disease (COVID-19) have been diagnosed in China, with exportation events to nearly
90 countries, as of March 6, 2020.
1
Given the novelty of the causative pathogen (named SARS-CoV-2), scientists have rushed
to fill epidemiological, virological, and clinical knowledge gaps—resulting in over
50 new studies about the virus between January 10 and January 30 alone.
2
However, in an era where the immediacy of information has become an expectation of
decision makers and the general public alike, many of these studies have been shared
first in the form of preprint papers—before peer review.
For the past three decades, preprint servers have become commonplace in the scientific
publication ecosystem, and COVID-19 has prompted a seemingly unprecedented use of
these platforms.
3
Although peer-review is crucial for the validation of science, the ongoing outbreak
has showcased the speed with which preprints can disseminate information during emergencies.
In this Comment, we used both preprint and peer-reviewed studies that estimated the
transmissibility potential (ie, basic reproduction number [R
0]) of SARS-CoV-2 on or before Feb 1, 2020 to investigate the role that preprints
have had in information dissemination during the ongoing outbreak. We also analysed
the agreement of preprint estimates compared with those presented by peer-reviewed
studies and propose a consensus-based approach for evaluating the validity of preprint
findings during public health crises. For our analysis, we collected publicly available
data from scientific studies, news reports, and search trends pertaining to SARS-CoV-2
and its R
0. Defined as the average number of secondary infections that a new case might transmit
in a fully susceptible population, estimates of R
0 can provide decision makers with insights into the epidemic potential of a given
outbreak.
Relevant news reports were discovered through MediaCloud and search trends by use
of Google Search Trends, and both served as a proxy indicator for information dissemination.
Meanwhile, relevant scientific studies were discovered through a combination of searches
executed with use of Google Scholar and, to address possible delays in indexing, four
popular public preprint servers (ie, arXiv, bioRxiv, medRxiv, and Social Science Research
Network [SSRN]) that we believe are representative of the relevant preprint literature.
Search terms and specifications for each data source are outlined in the appendix
(p 2). All studies discovered through Google Scholar, arXiv, bioRxiv, medRxiv, and
SSRN were manually checked for relevance to the topic area of interest. We retained
only studies that included estimates for the R
0 associated with SARS-CoV-2 in the body of the text.
After this initial data discovery phase, which yielded 11 individual studies, date
of first publication, publication platform, review status (ie, preprint vs peer-reviewed),
and methodological details were manually curated from each study (appendix p 3).4,
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18
R
0 estimates were also extracted from each study for further analysis. In the event
of multiple R
0 estimates—because of preprint revisions after the first version or the use of multiple
approaches in a single study—each estimate was recorded and treated as a separate
entry to represent all available knowledge at any given point in time (appendix p
3). Given that the first known preprint estimates for R
0 were posted to SSRN by us on Jan 23, we plotted search trend fractions and news
report volume between Jan 23 and Feb 1 (appendix p 4). Baseline data for both sources
before Jan 23, 2020, yielded negligible search trend interest and news report volume,
and data collected up to Feb 9, 2020, showed diminishing interest and volume after
the catchment window (appendix p 4). To illustrate when each of the 11 relevant studies
became available to the public, indicator bars were overlaid against the search trend
and news report data by date of publication (appendix p 4). We then plotted each of
the 16 R
0 estimates produced by the 11 studies, including both the mean and the estimate range
(eg, 95% CI, 95% credible interval, and so on) presented (appendix p 3). Estimates
were plotted by date of publication and alphabetically there-in, offering a side-by-side
comparison of preprint versus peer-reviewed results; averages and 95% CIs were also
computed for both groups (figure
).
Figure
R
0 mean and range estimates from 11 different studies of COVID–19 as a function of
time
For preprints that were revised before publication of the first relevant peer-reviewed
study on Jan 29, the version number is indicated between parentheses as (n). When
multiple R
0 estimates were presented in a single study because of the use of multiple approaches,
the version number is followed by a single decimal place to indicate the approach
used (n.n). If a first author published more than one relevant independent study before
Feb 1, the version number is followed immediately by an alphabetical marker ordered
by date of publication (nx). Ranges presented vary by study (eg, 95% CI, 95% credible
interval, and so on) and are presented in the appendix (p 3). R
0=basic reproduction number.
Google Search Trends and MediaCloud data suggested that both general (ie, search)
interest and news media interest in the R
0 associated with COVID–19 peaked before the publication of relevant peer-reviewed
studies during the early stages of the epidemic. In the selected time frame, search
interest peaked on Jan 27 after a sharp increase between Jan 23 and Jan 25 immediately
after the publication of five early preprint studies—all of which estimated R
0—in bioRxiv, medRxiv, and SSRN. Meanwhile, news media interest peaked on Jan 28,
coinciding with a sixth preprint study published in arXiv (appendix p 4). The first
peer-reviewed estimates were then published by Li and colleagues in The New England
Journal of Medicine on Jan 29 at 17:00 h (eastern standard time), followed by four
additional peer-reviewed studies in Eurosurveillance, The International Journal of
Infectious Diseases, The Lancet, and Journal of Clinical Medicine up to Feb 1.14,
19 Average R
0 estimates across the preprint group were 3·61 (95% CI 2·77–4·45) and 2·54 (2·17–2·91)
across the peer-reviewed group—showing overlap in 95% CIs despite a wide diversity
of modelling methods and data sources used both in-group and across-group (appendix
p 3). Although the average mean for the preprint group was higher than that for the
peer-reviewed group, this effect was driven primarily by two upper-limit outlier estimates
(with R
0 higher than the 95% CI maximum; figure).9, 10 Exclusion of these two estimates by
use of a consensus-based approach based on the 95% CIs yielded an average R
0 estimate of 3·02 (95% CI 2·65–3·39) for the preprint group. Notably, two studies
in the peer-reviewed group had previously been published as preprints.15, 16 Although
estimates presented by Riou and Althaus remained unchanged after peer review, estimates
presented by Zhao and colleagues were higher before peer review than afterwards.
Our findings suggest that, because of the speed of their release, preprints—rather
than peer-reviewed literature in the same topic area—might be driving discourse related
to the ongoing COVID-19 outbreak. Although our analysis focused on search trends and
news media data as a measure for general discourse, it is likely that preprints are
also influencing policy making discussions, given that WHO announced on Jan 26, 2020,
that they would be creating a repository of relevant studies—including those that
have not yet been peer-reviewed.
20
Nevertheless, despite the advantages of speedy information delivery, the lack of peer
review can also translate into issues of credibility and misinformation, both intentional
and unintentional. This particular drawback has been highlighted during the ongoing
outbreak, especially after the high-profile withdrawal of a virology study from the
preprint server bioRxiv, which erroneously claimed that COVID-19 contained HIV “insertions”.
21
The very fact that this study was withdrawn showcases the power of open peer-review
during emergencies; the withdrawal itself appears to have been prompted by outcry
from dozens of scientists from around the globe who had access to the study because
it was placed on a public server.
22
Much of this outcry was documented on Twitter (a microblogging platform) and on longer-form
popular science blogs, signalling that such fora would serve as rich additional data
sources for future work on the impact of preprints on public discourse.
22
However, instances such as this one described showcase the need for caution when acting
upon the science put forth by any one preprint.
With this in mind, taking multiple studies into consideration as presented in our
analysis can help operationalise the kind of caution necessitated by preprints while
simultaneously allowing for important, robust insights before the publication of a
peer-reviewed study in the same topic area. Here, we used a simple method in which
we plotted the ten R
0 estimates that were posted as preprints before publication of the first peer-reviewed
study on Jan 29; we then took the average of these estimates and excluded the two
estimates that qualified as upper-limit outliers—both upon visual inspection and as
a function of the 95% CI. Even before outlier elimination, this simple method yielded
average R
0 estimates similar to those presented by the peer-reviewed studies subsequently published
on and after Jan 29; however, more complex approaches that incorporate weighted averages
based on estimate confidence, similar to traditional meta-analytical methods, offer
a promising avenue for future work. Such collective, consensus-based approaches will
arguably be easiest to use when the research of interest is quantitative in nature;
nevertheless, given that many crucial epidemiological parameters that inform decision
making (eg, incubation period, generation time, and so on) are quantitative, our proposed
approach could work well in these contexts as well.
Our work showcases the powerful role preprints can have during public health crises
because of the timeliness with which they can disseminate new information. Furthermore,
given that two of the preprints included in this analysis were later published in
peer-reviewed outlets, the evidence shows that that even prestigious journals now
permit the sharing of important findings before peer review and that the use of preprint
platforms does not jeopardise future peer-reviewed publication.15, 16 Without question,
primacy and peer-reviewed publications are key metrics in individual professional
advancement (eg, academic promotion); nevertheless, the impact of preprints on discourse
and decision making pertaining to the ongoing COVID-19 outbreak suggests that we must
rethink how we reward and recognise community contributions during present and future
public health crises.