Since the identification of theSARS-CoV-2 in Wuhan, China, in January 2020 (1), the
origin of the virus has been a topic of intense scientific debate and public speculation.
The two main hypotheses are that the virus emerged from human exposure to an infected
animal [“zoonosis” (2)] or that it emerged in a research-related incident (3). The
investigation into the origin of the virus has been made difficult by the lack of
key evidence from the earliest days of the outbreak—there’s no doubt that greater
transparency on the part of Chinese authorities would be enormously helpful. Nevertheless,
we argue here that there is much important information that can be gleaned from US-based
research institutions, information not yet made available for independent, transparent,
and scientific scrutiny.
When it comes to deciphering the origins of COVID-19, much important information can
be gleaned from US-based research institutions—information that has yet to be made
available for independent, transparent, and scientific scrutiny. Image credit: Dave
Cutler (artist).
The data available within the United States would explicitly include, but are not
limited to, viral sequences gathered and held as part of the PREDICT project and other
funded programs, as well as sequencing data and laboratory notebooks from US laboratories.
We call on US government scientific agencies, most notably the NIH, to support a full,
independent, and transparent investigation of the origins of SARS-CoV-2. This should
take place, for example, within a tightly focused science-based bipartisan Congressional
inquiry with full investigative powers, which would be able to ask important questions—but
avoid misguided witch-hunts governed more by politics than by science.
Essential US Investigations
The US intelligence community (IC) was tasked, in 2021 by President Joe Biden (4),
with investigating the origin of the virus. In their summary public statement, the
IC writes that “all agencies assess that two hypotheses are plausible: natural exposure
to an infected animal and a laboratory-associated incident” (4). The IC further writes
that “China’s cooperation most likely would be needed to reach a conclusive assessment
of the origins of COVID-19 [coronavirus disease 2019].” Of course, such cooperation
is highly warranted and should be pursued by the US Government and the US scientific
community. Yet, as outlined below, much could be learned by investigating US-supported
and US-based work that was underway in collaboration with Wuhan-based institutions,
including the Wuhan Institute of Virology (WIV), China. It is still not clear whether
the IC investigated these US-supported and US-based activities. If it did, it has
yet to make any of its findings available to the US scientific community for independent
and transparent analysis and assessment. If, on the other hand, the IC did not investigate
these US-supported and US-based activities, then it has fallen far short of conducting
a comprehensive investigation.
This lack of an independent and transparent US-based scientific investigation has
had four highly adverse consequences. First, public trust in the ability of US scientific
institutions to govern the activities of US science in a responsible manner has been
shaken. Second, the investigation of the origin of SARS-CoV-2 has become politicized
within the US Congress (5); as a result, the inception of an independent and transparent
investigation has been obstructed and delayed. Third, US researchers with deep knowledge
of the possibilities of a laboratory-associated incident have not been enabled to
share their expertise effectively. Fourth, the failure of NIH, one of the main funders
of the US–China collaborative work, to facilitate the investigation into the origins
of SARS-CoV-2 (4) has fostered distrust regarding US biodefense research activities.
Much of the work on SARS-like CoVs performed in Wuhan was part of an active and highly
collaborative US–China scientific research program funded by the US Government (NIH,
Defense Threat Reduction Agency [DTRA], and US Agency for International Development
[USAID]), coordinated by researchers at EcoHealth Alliance (EHA), but involving researchers
at several other US institutions. For this reason, it is important that US institutions
be transparent about any knowledge of the detailed activities that were underway in
Wuhan and in the United States. The evidence may also suggest that research institutions
in other countries were involved, and those too should be asked to submit relevant
information (e.g., with respect to unpublished sequences).
Participating US institutions include the EHA, the University of North Carolina (UNC),
the University of California at Davis (UCD), the NIH, and the USAID. Under a series
of NIH grants and USAID contracts, EHA coordinated the collection of SARS-like bat
CoVs from the field in southwest China and southeast Asia, the sequencing of these
viruses, the archiving of these sequences (involving UCD), and the analysis and manipulation
of these viruses (notably at UNC). A broad spectrum of coronavirus research work was
done not only in Wuhan (including groups at Wuhan University and the Wuhan CDC, as
well as WIV) but also in the United States. The exact details of the fieldwork and
laboratory work of the EHA-WIV-UNC partnership, and the engagement of other institutions
in the United States and China, has not been disclosed for independent analysis. The
precise nature of the experiments that were conducted, including the full array of
viruses collected from the field and the subsequent sequencing and manipulation of
those viruses, remains unknown.
EHA, UNC, NIH, USAID, and other research partners have failed to disclose their activities
to the US scientific community and the US public, instead declaring that they were
not involved in any experiments that could have resulted in the emergence of SARS-CoV-2.
The NIH has specifically stated (6) that there is a significant evolutionary distance
between the published viral sequences and that of SARS-CoV-2 and that the pandemic
virus could not have resulted from the work sponsored by NIH. Of course, this statement
is only as good as the limited data on which it is based, and verification of this
claim is dependent on gaining access to any other unpublished viral sequences that
are deposited in relevant US and Chinese databases (7,8). On May 11, 2022, Acting
NIH Director Lawrence Tabak testified before Congress that several such sequences
in a US database were removed from public view, and that this was done at the request
of both Chinese and US investigators.
Blanket denials from the NIH are no longer good enough. Although the NIH and USAID
have strenuously resisted full disclosure of the details of the EHA-WIV-UNC work program,
several documents leaked to the public or released through the Freedom of Information
Act (FOIA) have raised concerns. These research proposals make clear that the EHA-WIV-UNC
collaboration was involved in the collection of a large number of so-far undocumented
SARS-like viruses and was engaged in their manipulation within biological safety level
(BSL)-2 and BSL-3 laboratory facilities, raising concerns that an airborne virus might
have infected a laboratory worker (9). A variety of scenarios have been discussed
by others, including an infection that involved a natural virus collected from the
field or perhaps an engineered virus manipulated in one of the laboratories (3).
Overlooked Details
Special concerns surround the presence of an unusual furin cleavage site (FCS) in
SARS-CoV-2 (10) that augments the pathogenicity and transmissibility of the virus
relative to related viruses like SARS-CoV-1 (11, 12). SARS-CoV-2 is, to date, the
only identified member of the subgenus sarbecovirus that contains an FCS, although
these are present in other coronaviruses (13, 14). A portion of the sequence of the
spike protein of some of these viruses is illustrated in the alignment shown in Fig.
1, illustrating the unusual nature of the FCS and its apparent insertion in SARS-CoV-2
(15). From the first weeks after the genome sequence of SARS-CoV-2 became available,
researchers have commented on the unexpected presence of the FCS within SARS-CoV-2—the
implication being that SARS-CoV-2 might be a product of laboratory manipulation. In
a review piece arguing against this possibility, it was asserted that the amino acid
sequence of the FCS in SARS-CoV-2 is an unusual, nonstandard sequence for an FCS and
that nobody in a laboratory would design such a novel FCS (13).
Fig. 1.
This alignment of the amino acid sequences of coronavirus spike proteins, in the region
of the S1/S2 junction, illustrates the sequence of SARS-CoV-2 (Wuhan-Hu-1) and some
of its closest relatives. The furin cleavage site (FCS) is indicated (PRRAR'SVAS),
and furin cuts the spike protein between R and S, as indicated by the red arrowhead.
Adapted from Chan & Zhan (15).
In fact, the assertion that the FCS in SARS-CoV-2 has an unusual, nonstandard amino
acid sequence is false. The amino acid sequence of the FCS in SARS-CoV-2 also exists
in the human ENaC α subunit (16), where it is known to be functional and has been
extensively studied (17, 18). The FCS of human ENaC α has the amino acid sequence
RRAR'SVAS (Fig. 2), an eight–amino-acid sequence that is perfectly identical with
the FCS of SARS-CoV-2 (16). ENaC is an epithelial sodium channel, expressed on the
apical surface of epithelial cells in the kidney, colon, and airways (19, 20), that
plays a critical role in controlling fluid exchange. The ENaC α subunit has a functional
FCS (17, 18) that is essential for ion channel function (19) and has been characterized
in a variety of species. The FCS sequence of human ENaC α (20) is identical in chimpanzee,
bonobo, orangutan, and gorilla (
SI Appendix, Fig. 1), but diverges in all other species, even primates, except one.
(The one non-human non-great ape species with the same sequence is Pipistrellus kuhlii,
a bat species found in Europe and Western Asia; other bat species, including Rhinolophus
ferrumequinem, have a different FCS sequence in ENaC α [RKAR'SAAS]).
Fig. 2.
Amino acid alignment of the furin cleavage sites of SARS-CoV-2 spike protein with
(Top) the spike proteins of other viruses that lack the furin cleavage site and (Bottom)
the furin cleavage sites present in the α subunits of human and mouse ENaC. Adapted
from Anand et al. (16).
One consequence of this “molecular mimicry” between the FCS of SARS CoV-2 spike and
the FCS of human ENaC is competition for host furin in the lumen of the Golgi apparatus,
where the SARS-CoV-2 spike is processed. This results in a decrease in human ENaC
expression (21). A decrease in human ENaC expression compromises airway function and
has been implicated as a contributing factor in the pathogenesis of COVID-19 (22).
Another consequence of this astonishing molecular mimicry is evidenced by apparent
cross-reactivity with human ENaC of antibodies from COVID-19 patients, with the highest
levels of cross-reacting antibodies directed against this epitope being associated
with most severe disease (23).
We do not know whether the insertion of the FCS was the result of natural evolution
(2, 13)—perhaps via a recombination event in an intermediate mammal or a human (13,
24)—or was the result of a deliberate introduction of the FCS into a SARS-like virus
as part of a laboratory experiment. We do know that the insertion of such FCS sequences
into SARS-like viruses was a specific goal of work proposed by the EHA-WIV-UNC partnership
within a 2018 grant proposal (“DEFUSE”) that was submitted to the US Defense Advanced
Research Projects Agency (DARPA) (25). The 2018 proposal to DARPA was not funded,
but we do not know whether some of the proposed work was subsequently carried out
in 2018 or 2019, perhaps using another source of funding.
We also know that that this research team would be familiar with several previous
experiments involving the successful insertion of an FCS sequence into SARS-CoV-1
(26) and other coronaviruses, and they had a lot of experience in construction of
chimeric SARS-like viruses (27
–29). In addition, the research team would also have some familiarity with the FCS
sequence and the FCS-dependent activation mechanism of human ENaC α (19), which was
extensively characterized at UNC (17, 18). For a research team assessing the pandemic
potential of SARS-related coronaviruses, the FCS of human ENaC—an FCS known to be
efficiently cleaved by host furin present in the target location (epithelial cells)
of an important target organ (lung), of the target organism (human)—might be a rational,
if not obvious, choice of FCS to introduce into a virus to alter its infectivity,
in line with other work performed previously.
Of course, the molecular mimicry of ENaC within the SARS-CoV-2 spike protein might
be a mere coincidence, although one with a very low probability. The exact FCS sequence
present in SARS-CoV-2 has recently been introduced into the spike protein of SARS-CoV-1
in the laboratory, in an elegant series of experiments (12, 30), with predictable
consequences in terms of enhanced viral transmissibility and pathogenicity. Obviously,
the creation of such SARS-1/2 “chimeras” is an area of some concern for those responsible
for present and future regulation of this area of biology. [Note that these experiments
in ref. 30 were done in the context of a safe “pseudotyped” virus and thus posed no
danger of producing or releasing a novel pathogen.] These simple experiments show
that the introduction of the 12 nucleotides that constitute the FCS insertion in SARS-CoV-2
would not be difficult to achieve in a lab. It would therefore seem reasonable to
ask that electronic communications and other relevant data from US groups should be
made available for scrutiny.
Seeking Transparency
To date, the federal government, including the NIH, has not done enough to promote
public trust and transparency in the science surrounding SARS-CoV-2. A steady trickle
of disquieting information has cast a darkening cloud over the agency. The NIH could
say more about the possible role of its grantees in the emergence of SARS-CoV-2, yet
the agency has failed to reveal to the public the possibility that SARS-CoV-2 emerged
from a research-associated event, even though several researchers raised that concern
on February 1, 2020, in a phone conversation that was documented by email (5). Those
emails were released to the public only through FOIA, and they suggest that the NIH
leadership took an early and active role in promoting the “zoonotic hypothesis” and
the rejection of the laboratory-associated hypothesis (5). The NIH has resisted the
release of important evidence, such as the grant proposals and project reports of
EHA, and has continued to redact materials released under FOIA, including a remarkable
290-page redaction in a recent FOIA release.
Information now held by the research team headed by EHA (7), as well as the communications
of that research team with US research funding agencies, including NIH, USAID, DARPA,
DTRA, and the Department of Homeland Security, could shed considerable light on the
experiments undertaken by the US-funded research team and on the possible relationship,
if any, between those experiments and the emergence of SARS-CoV-2. We do not assert
that laboratory manipulation was involved in the emergence of SARS-CoV-2, although
it is apparent that it could have been. However, we do assert that there has been
no independent and transparent scientific scrutiny to date of the full scope of the
US-based evidence.
The relevant US-based evidence would include the following information: laboratory
notebooks, virus databases, electronic media (emails, other communications), biological
samples, viral sequences gathered and held as part of the PREDICT project (7) and
other funded programs, and interviews of the EHA-led research team by independent
researchers, together with a full record of US agency involvement in funding the research
on SARS-like viruses, especially with regard to projects in collaboration with Wuhan-based
institutions. We suggest that a bipartisan inquiry should also follow up on the tentative
conclusion of the IC (4) that the initial outbreak in Wuhan may have occurred no later
than November 2019 and that therefore the virus was circulating before the cluster
of known clinical cases in December. The IC did not reveal the evidence for this statement,
nor when parts of the US Government or US-based researchers first became aware of
a potential new outbreak. Any available information and knowledge of the earliest
days of the outbreak, including viral sequences (8), could shed considerable light
on the origins question.
We continue to recognize the tremendous value of US–China cooperation in ongoing efforts
to uncover the proximal origins of the pandemic. Much vital information still resides
in China, in the laboratories, hospital samples, and early epidemiological information
not yet available to the scientific community. Yet a US-based investigation need not
wait—there is much to learn from the US institutions that were extensively involved
in research that may have contributed to, or documented the emergence of, the SARS-CoV-2
virus. Only an independent and transparent investigation, perhaps as a bipartisan
Congressional inquiry, will reveal the information that is needed to enable a thorough
scientific process of scrutiny and evaluation.
Supplementary Material
Supplementary File