BACKGROUND
The world is undergoing a battle against the novel coronavirus (SARS-CoV-2). By the end of July 2021, more than 200 million cases of COVID-19 and 4.25 million deaths had occurred worldwide. The pandemic has affected 212 countries and regions globally. Inevitably, all humanity must work together to overcome this obstacle. Since entering the 21st century, humans have experienced three outbreaks of pneumonia due to coronavirus, thus providing a strong reminder that we must pay sufficient attention to coronavirus prevention and treatment. Tracing the origin of SARS-CoV-2 and its route of transmission is important for the development of treatment and prevention strategies for future recurrent epidemics.
According to the transmission route of the virus, hosts are generally divided into natural hosts, intermediate hosts and final hosts. The intermediate hosts of a virus may include multiple species, which act as a vehicle that “transports” the virus from the natural host to the final host. To control further spread of a virus, beyond isolating and treating already infected people, the discovery and isolation of intermediate hosts can actually block the infection from the source. Palm civets may be an intermediate host of SARS-CoV [1], and dromedary camels may be an intermediate host of MARS-CoV [2], both of which have been demonstrated to have originated from bats [3–5]. Shi ZL, et al. have reported a 96.2% sequence similarity of SARS-CoV-2 and the bat coronavirus RaTG13 (bat-CoV-RaTG13) carried by Rhinolophus affinis in Yunnan Province, China. Furthermore, the sequence similarity of the S gene (encoding the spike protein) of SARS-CoV-2 and bat-CoV-RaTG13 is 93.1%, a value much higher than those with other SARS-CoVs [6].
Currently, research on intermediate hosts of SARS-CoV-2 is underway, and the investigated animals include pangolins, minks and turtles. In four studies, the genome sequence similarity between pangolin-CoVs and SARS-CoV-2 has been reported to be 85.5% to 92.4% [7], 91.02% [8], 90.3% [9], and 90.23% [10]. Two species of SARS-CoV-2 related pangolin-CoVs are known: pangolin-CoV GD and pangolin-CoV GX. Researchers have found that although SARS-CoV-2 is closest to bat-CoV-RaTG13 in other regions, SARS-CoV-2 has a high sequence similarity with the receptor binding domain of pangolin-associated coronaviruses. One study has shown that pangolin-CoV GD exhibits strong similarity to SARS-CoV-2 in the receptor-binding domain, and 97.4% amino acid sequence similarity, a value higher than that of bat-CoV-RaTG13 (89.2%) [7]. Three studies have supported this result, showing that the RBD is highly conserved between pangolin-CoV GD and SARS-CoV-2, with only a one amino acid residue difference [8–10]. Furthermore, pangolin-CoVs and SARS-CoV-2 have the same amino acids at five key residue positions in the RBD, whereas bat-CoV-RaTG13 has only one amino acid residue consistent with the SARS-CoV-2 sequence [7,8]. Researchers have also suggested that the amino acid similarity between the pangolin-associated coronavirus RBD and SARS-CoV-2 may be due to selectively mediated convergence during evolution rather than recombination.
However, the SARS-CoV-2 spike protein has a special “PRRA” motif insertion at the S1/S2 cleavage site [7,8,10,11], and this motif is not found in bat-CoV-RaTG13 or pangolin-CoVs. Chen J, et al. have suggested that this motif might have been inserted in other intermediate hosts during viral transmission [10]. Therefore, determining whether pangolins are intermediate hosts of SARS-CoV-2 will require many additional experimental samples and data analysis. Zhu H, et al. have found that mink coronavirus shows an infection pattern closer to that of SARS-CoV-2 according to deep learning algorithms, thus suggesting that minks might be an intermediate host of SARS-CoV-2 [12]. Moreover, another study has suggested that turtles might be intermediate hosts of SARS-CoV-2 [13].
At present, the intermediate host of SARS-CoV-2 has not been determined, and most researchers believe that more than one intermediate host exists. Other researchers believe that intermediate hosts might not be necessary, and the virus can directly infect humans. Most studies have performed identity analysis of genomic sequences only between the potential intermediate host and SARS-CoV-2, and have conducted similarity analysis of some protein domains. No research team is currently conducting experimental verification.
Here, we selected the angiotensin-converting enzyme 2 (ACE2) sequences from other species with the closest homology to the hACE2 protein, including primates, Chiroptera, Felidae, Canidae, Circetidae, Camelidae, and the previously reported Manis javanica and Mustela putorius furo. These species were divided into different families on the basis of sequence alignment, phylogenetic tree analysis and homology modeling of all ACE2 proteins. Protein-protein docking of the SARS-CoV-2 spike with ACE2 from different species and calculations of the binding free energy were performed to identify potential intermediate hosts or animal species susceptible to SARS-CoV-2. In addition, two coronavirus spike proteins with the highest similarity to the SARS-CoV-2 spike were modeled, then docked with hACE2 and various ACE2 proteins to calculate the free energy, to determine the possibility of these coronaviruses directly infecting humans and other animals. We thus used a new approach for mining intermediate hosts and systematically analyzing the potential natural and intermediate hosts of SARS-CoV-2 by calculating the binding free energy between RBD and ACE2. We also provide suggestions for the selection of experimental animals for COVID-19 research.
METHODS
Homology ACE2 protein BLAST searching and sequence alignment
Amino acid sequence editing was conducted in Bioedit and DNAMAN, and sequence alignment was conducted with Clustalw. The evolutionary history was inferred with the neighbor-Joining method in the MEGA 7 software package. The percentage of replicate trees in which the associated taxa clustered in the bootstrap test was determined with 1000 replicates. Subsequently, 3D structures were analyzed with the PyMOL tool.
The full length ACE2 sequence (NP_001358344.1) was downloaded from the NCBI protein database. The amino acid sequences were aligned with sequences in the entire database with BLASTp to search for homology to the ACE2 protein (algorithm parameters: maximum target sequences: 1000, expected threshold: 10). Accession numbers of the 82 chosen ACE2 sequences are listed in Table 2.
Homology modeling and molecular docking
On the basis of the recently reported structure of the SARS-CoV-2 spike RBD-ACE2 complex (PDB code: 6LZG) [14], corresponding homology models of each spike RBD and ACE2 were built. Alignment of two protein sequences and subsequent homology modeling were performed with the bioinformatics module of ICM 3.7.3 modeling software on an Intel i7 4960 processor (MolSoft LLC, San Diego, CA) [15]. Protein-protein docking was performed according to the ICM-Pro manual, and the free binding energy was calculated. The receptors were each ACE2 homologues, and the ligands were each CoV-RBDs. The epitopes of both ACE2 and RBD were selected near the interface of the complex, with the SARS-CoV-2 spike RBD-ACE2 complex crystal structure (PDB code: 6LZG) used as the reference.
RESULTS
Bioinformatics analysis of ACE2 proteins
SARS-CoV-2 uses ACE2 as the cellular receptor to invade host cells in a species-dependent manner, as directly reflected in the binding affinity and specificity of the spike RBD and host ACE2. Consequently, ACE2 usage is a crucial determinant of infectivity and host range. Therefore, we collected 1000 hACE2 homologous protein sequences with the BLASTp method. ACE2 sequences from 82 species were chosen, and phylogenetic tree analysis was performed (Fig 1). The 82 species mainly belonged to Mammalia, and several were from other classes, such as Aves, Reptilia and Sauropsida. The mammalian group included primates, rodents, odd-toed hoofed mammals, artiodactyls, carnivores, lagomorphs and bats. Bats have been proposed to be the natural host of SARS-CoV-2 [16]. To identify possible source hosts, we collected all available ACE2 sequences from Chiroptera (total number of 17).
The structure of the hACE2 and SARS-CoV-2 spike-RBD complex has been solved [14,17], as shown in Fig 2A and B. The seven amino acids at the hACE2 binding interface and spike-RBD form eight hydrogen bonding interactions: Gln24, Asp30, His34, Tyr41 and Gln42 in hACE2 form hydrogen bonding interactions with Gln474, Lys417, Tyr453, Asn501 and Gln498 in the SARS-CoV-2 spike-RBD. Among them, two hydrogen bonds are formed between Gln42 in hACE2 and Gln498 in the spike-RBD. Moreover, Lys353 and Arg357 in hACE2 interact with Asn501 and Thr500 in the spike protein, respectively, through hydrogen bonds (Fig 2A and B). In addition, Met82 in ACE2 interacts with Phe486 in the spike-RBD through hydrophobic interactions. We also analyzed the binding pattern of ACE2 from Rhinolophus sinicus and Mesocricetus auratus with the spike-RBD from SARS-CoV-2 through a docking model. Both also form eight hydrogen bonds. According to the sequence comparison results, two key amino acids in the Rhinolophus sinicus ACE2 sequence differ from the human sequence (Fig 2G). In Rhinolophus sinicus, the ACE2 sequence has Arg24 instead of Gln24, and Ser34 instead of His34. Arg24 and Ser34 interact with Ser477 and Gln493 through hydrogen bonds (Fig 2C, D, G and H). Only one key amino acid in the Mesocricetus auratus ACE2 sequence differs from the human sequence (Fig 2E–H). In Mesocricetus auratus, the ACE2 sequence has Gln34 rather than His34, but Gln34 can also form a hydrogen bonding interaction with Tyr453. However, Gln24 in the Mesocricetus auratus ACE2 forms a hydrogen bonding interaction with Asn487 instead of Gln474. The key interactions between amino acids in ACE2 and the spike-RBD are marked in Fig 2G and H. The detailed comparison of key amino acids for all 82 ACE2s is shown in Fig 3.
Homology modeling and protein-protein docking calculation
All ACE2 protein structures were homology modeled with ICM modeling software with the hACE2 structure as the template. The binding free energy was calculated by docking the spike protein of SARS-CoV-2 or other coronaviruses with each ACE2 protein. In most cases, the generated conformation resembling the crystal structure of the hACE2 SARS-CoV-2 RBD complex was the conformation with the minimum energy. The results obtained are shown in Tables 1 and 2.
No. | Virus name | RBD similarity to SARS-CoV-2 | Binding free energy with hACE2 (kJ.mol−1) |
---|---|---|---|
1 | SARS-CoV-2 | 100% | −50.1326 |
2 | Pangolin-CoV GD | 97.1% | −48.0341 |
3 | Bat RaTG13 | 89.2% | −44.9803 |
4 | Pangolin-CoV GX | 87.1% | −40.1424 |
5 | SARS-CoV | 74.6% | −49.2229 |
No. | Species name | Similarity | GenBank accession number | Binding free energy with SARS-CoV-2 RBD (kJ.mol−1) | Binding free energy with RaTG13 RBD (kJ.mol−1) | Binding free energy with pangolin-CoV GD RBD (kJ.mol−1) | C- RMSD to human ACE2 (Å) |
---|---|---|---|---|---|---|---|
1 | Homo sapiens | 100% | NP_001358344.1 | −50.1326 | −44.9803 | −48.0341 | − |
2 | Gorilla gorilla | 99.01% | XP_018874749.1 | −51.5556 | −42.7332 | −44.7128 | 0.198 |
3 | Macaca nemestrina | 95.34% | XP_011733505.1 | −51.5325 | −42.6326 | −44.0687 | 0.193 |
4 | Papio anubis | 95.34% | XP_021788732.1 | −51.5628 | −42.6162 | −44.0165 | 0.193 |
5 | Macaca fascicularis | 95.21% | XP_005593094.1 | −51.5373 | −42.6172 | −44.021 | 0.193 |
6 | Macaca mulatta | 95.21% | ACI04556.1 | −51.5677 | −42.8581 | −44.0623 | 0.193 |
7 | Aotus nancymaae | 92.17% | XP_012290105.1 | −42.8772 | −42.5036 | −41.4306 | 0.237 |
8 | Equus przewalskii | 86.90% | XP_008542995.1 | −48.8959 | −40.1971 | −35.1463 | 0.258 |
9 | Ceratotherium simum | 85.77% | XP_004435206.1 | −48.3243 | −41.1406 | −42.9484 | 0.228 |
10 | Panthera tigris ssp. altaica | 85.70% | XP_007090142.1 | −50.6125 | −40.7855 | −42.2621 | 0.227 |
11 | Puma concolor | 85.59% | XP_025790417.1 | −50.5544 | −40.7563 | −41.496 | 0.226 |
12 | Panthera pardus | 85.47% | XP_019273508.1 | −50.6849 | −41.7507 | −42.4629 | 0.277 |
13 | Ictidomys tridecemlineatus | 85.38% | XP_005316051.3 | −48.8769 | −42.2544 | −44.4516 | 0.278 |
14 | Felis catus | 85.22% | NP_001034545.1 | −48.8741 | −41.5772 | −42.2018 | 0.266 |
15 | Lynx pardinus | 85.22% | VFV30336.1 | −50.6549 | −39.4012 | −40.0497 | 0.243 |
16 | Oryctolagus cuniculus | 85.14% | XP_002719891.1 | −48.5832 | −42.3481 | −44.3818 | 0.240 |
17 | Marmota marmota | 84.88% | XP_015343540.1 | −48.6519 | −43.0272 | −45.7725 | 0.276 |
18 | Urocitellus parryii | 84.76% | XP_026252505.1 | −47.6377 | −41.1093 | −42.6924 | 0.288 |
19 | Marmota flaviventris | 84.76% | XP_027802308.1 | −48.6645 | −41.4861 | −44.3536 | 0.275 |
20 | Manis javanica | 84.76% | XP_017505746.1 | −46.3551 | −43.2112 | −43.5113 | 0.252 |
21 | Chinchilla lanigera | 84.72% | XP_013362428.1 | −43.1693 | −37.1876 | −40.2995 | 0.233 |
22 | Fukomys damarensis | 84.72% | XP_010643477.1 | −42.1498 | −41.4333 | −42.681 | 0.233 |
23 | Jaculus jaculus | 84.63% | XP_004671523.1 | −46.0314 | −44.3856 | −44.9497 | 0.245 |
24 | Heterocephalus glaber | 84.60% | XP_004866157.1 | −42.0874 | −43.38 | −39.5228 | 0.201 |
25 | Octodon degus | 84.47% | XP_023575315.1 | −35.7756 | −37.3313 | −39.9532 | 0.237 |
26 | Mesocricetus auratus | 84.26% | XP_005074266.1 | −50.4353 | −44.7522 | −47.3596 | 0.253 |
27 | Arlito syrichta | 84.10% | XP_008062810.1 | −37.8413 | −37.389 | −33.1841 | 0.286 |
28 | Canis lupus dingo | 84.01% | XP_025292925.1 | −40.7918 | −35.2498 | −36.5903 | 0.224 |
29 | Nyctereutes procyonoides | 84.01% | ABW16956.1 | −43.609 | −37.77 | −37.9879 | 0.223 |
30 | Ursus maritimus | 83.92% | XP_008694637.1 | −45.0617 | −33.7685 | −35.7023 | 0.271 |
31 | Ursus arctos | 83.88% | XP_026333865.1 | −45.0899 | −35.5917 | −37.2856 | 0.270 |
32 | Vulpes vulpes | 83.63% | XP_025842512.1 | −45.4803 | −34.3498 | −38.4228 | 0.224 |
33 | Microtus ochrogaster | 83.63% | XP_005358818.1 | −44.1707 | −41.3675 | −42.0092 | 0.224 |
34 | Canis lupus familiaris | 83.50% | NP_001158732.1 | −40.7225 | −38.8909 | −37.9272 | 0.297 |
35 | Paguma larvata | 83.48% | Q56NL1.1 | −49.3514 | −37.1641 | −37.1826 | 0.275 |
36 | Equus asinus | 83.40% | XP_014713133.1 | −48.0456 | −39.2759 | −35.7862 | 0.286 |
37 | Ailuropoda melanoleuca | 83.38% | XP_002930657.1 | −45.2657 | −36.5644 | −38.2467 | 0.294 |
38 | Crocuta crocuta | 83.35% | KAF0878287.1 | −50.1934 | −37.8297 | −27.2922 | 0.245 |
39 | Vicugna pacos | 83.35% | XP_006212709.1 | −44.6744 | −35.3267 | −34.9159 | 0.263 |
40 | Camelus ferus | 83.23% | XP_006194263.1 | −47.3657 | −38.2449 | −38.8581 | 0.233 |
41 | Phodopus campbelli | 82.87% | ACT66274.1 | −44.875 | −43.316 | −43.7214 | 0.286 |
42 | Mustela putorius | 82.74% | NP_001297119.1 | −45.3724 | −35.8347 | −38.5731 | 0.229 |
43 | Balaenoptera acutorostrata | 82.48% | XP_028020351.1 | −42.7212 | −38.3849 | −38.1349 | 1.312 |
44 | Rattus norvegicus | 82.37% | NP_001012006.1 | −47.2193 | −39.0555 | −42.0742 | 0.273 |
45 | Grammomys surdaster | 82.24% | XP_028617961.1 | −46.6804 | −42.3484 | −45.3621 | 0.250 |
46 | Sus scrofa domesticus | 81.94% | ACT66265.1 | −48.9879 | −40.7439 | −38.6853 | 0.297 |
47 | Mus musculus | 81.86% | NP_001123985.1 | −44.6578 | −38.9799 | −41.2323 | 0.231 |
48 | Capra hircus | 81.74% | NP_001277036.1 | −49.5148 | −47.6838 | −49.4663 | 1.309 |
49 | Ovis aries | 81.74% | XP_011961657.1 | −49.6762 | −43.7355 | −45.024 | 0.532 |
50 | Pteropus alecto | 81.49% | XP_006911709.1 | −47.2126 | −42.4567 | −44.3341 | 1.31 |
51 | Mastomys coucha | 81.38% | XP_031226742.1 | −46.7412 | −39.664 | −42.4614 | 0.267 |
52 | Sus scrofa | 81.37% | NP_001116542.1 | −49.0061 | −41.5093 | −43.5881 | 0.298 |
53 | Rhinolophus pearsonii | 81.37% | ABU54053.1 | −46.2924 | −34.2089 | −36.0739 | 0.306 |
54 | Bos mutus | 81.37% | XP_005903173.1 | −49.4998 | −41.6701 | −35.4578 | 1.31 |
55 | Camelus dromedarius | 80.87% | KAB1253106.1 | −47.28 | −39.7657 | −40.3033 | 0.287 |
56 | Rhinolophus macrotis | 80.87% | ADN93471.1 | −48.9215 | −43.8471 | −42.8564 | 0.307 |
57 | Tupaia chinensis | 80.75% | XP_006164754.1 | −39.509 | −36.6856 | −37.591 | 0.28 |
58 | Miniopterus natalensis | 80.75% | XP_016058453.1 | −43.4486 | −36.8746 | −37.7009 | 0.394 |
59 | Rhinolophus sinicus | 80.62% | ADN93475.1 | −50.4141 | −39.9513 | −42.6029 | 0.313 |
60 | Rhinolophus landeri | 80.62% | ALJ94034.1 | −46.5592 | −38.814 | −41.538 | 0.324 |
61 | Pteropus vampyrus | 80.62% | XP_011361275.1 | −46.333 | −39.0766 | −42.9668 | 0.471 |
62 | Loxodonta africana | 80.50% | XP_023410960.1 | −45.8706 | −38.0833 | −39.9194 | 0.758 |
63 | Rhinolophus alcyone | 80.50% | ALJ94035.1 | −46.4305 | −39.366 | −30.2506 | 0.324 |
64 | Rhinolophus ferrumequinum | 80.50% | ADN93470.1 | −46.4919 | −39.491 | −40.4483 | 0.313 |
65 | Eptesicus fuscus | 80.42% | XP_008153150.1 | −35.0887 | −36.0798 | −31.4181 | 0.559 |
66 | Myotis brandtii | 80.37% | XP_014399782.1 | −46.1067 | −41.6428 | −43.5682 | 0.614 |
67 | Rhinolophus pusillus | 80.35% | ADN93477.1 | −48.041 | −37.6987 | −38.1246 | 0.312 |
68 | Myotis lucifugus | 80.25% | XP_023609437.1 | −44.8588 | −36.6078 | −39.8423 | 0.359 |
69 | Cavia porcellus | 79.54% | ACT66270.1 | −37.9728 | −33.4454 | −35.6179 | 0.275 |
70 | Orycteropus afer | 79.38% | XP_007951028.1 | −46.2635 | −38.5732 | −41.149 | 0.579 |
71 | Myotis davidii | 79.15% | XP_006775273.1 | −46.8656 | −39.3552 | −43.0102 | 0.473 |
72 | Rousettus leschenaultii | 79.13% | ADJ19219.1 | −44.8589 | −37.6318 | −36.5318 | 0.359 |
73 | Dasypus novemcinctus | 79.13% | XP_004449124.1 | −40.5196 | −42.4187 | −44.3923 | 0.934 |
74 | Erinaceus europaeus | 79.01% | XP_007538670.1 | −49.2088 | −41.1565 | −40.376 | 0.278 |
75 | Rousettus aegyptiacus | 78.88% | XP_015974412.1 | −35.4247 | −38.1481 | −40.8802 | 0.352 |
76 | Pipistrellus abramus | 76.45% | ACT66266.1 | −40.3802 | −36.9656 | −38.7944 | 0.615 |
77 | Phascolarctos cinereus | 71.48% | XP_020863153.1 | −36.0763 | −36.9936 | −35.9102 | 0.323 |
78 | Crocodylus porosus | 67.45% | XP_019384827.1 | −40.4653 | −41.7424 | −32.9734 | 0.688 |
79 | Phasianus colchicus | 66.09% | XP_031451919.1 | −36.1372 | −31.6362 | −33.6769 | 1.155 |
80 | Struthio camelus | 65.01% | XP_009667495.1 | −45.8706 | −38.8162 | −35.3141 | 1.142 |
81 | Ophiophagus hannah | 56.91% | ETE61880.1 | −34.6833 | −29.8054 | −31.9762 | 1.112 |
82 | Meleagris gallopavo | 55.50% | XP_019467554.1 | −37.6367 | −38.8765 | −37.4142 | 0.447 |
As shown in Table 1, among all five closely related SARS family coronaviruses, the SARS-CoV-2 spike RBD appeared to have the strongest affinity for hACE2, in agreement with the observation of the high infectivity of SARS-CoV-2. Although the SARS-CoV RBD has the lowest similarity with that of SARS-CoV-2, its calculated binding affinity was closest to that of SARS-CoV-2. Moreover, slightly different binding modes in the interface were observed, as shown in the complex structures [14,17]. For the other three coronaviruses closely related to SARS-CoV-2, the greater the similarity of the RBD, the lower the free binding energy. We further chose SARS-CoV-2 and the two most similar viruses, bat RaTG13 and pangolin-CoV GD, for protein-protein docking studies.
According to the results in Table 2 and Fig 4, the binding energy of ACE2 receptors and SARS-CoV-2 RBD in various animals essentially followed a trend in which lower homology with the human sequence was associated with weaker binding energy, with some exceptions.
In primates, because ACE2s are highly homologous to hACE2, they had strong binding energy to RBD, potentially even stronger than that of hACE2. As shown in Table 2, ACE2 from primates (Macaca mulatta, Papio anubis, Gorilla gorilla, Macaca fascicularis and Macaca nemestrina) showed stronger binding to the RBD of SARS-CoV-2 than that hACE2 (−50.1326 kJ·mol−1), with a lower free binding energy than −51 kJ·mol−1. An exception was Aotus nancymaae ACE2, which has 92.17% sequence similarity to hACE2, but its binding toward the SARS-CoV-2 RBD was significantly lower than that of hACE2 and was also lower than that of some bird ACE2 isoforms.
The ACE2 of most of the Felidae selected in this study, such as Panthera pardus, Lynx pardinus, Panthera tigris and Puma concolor, had stronger binding to the RBD of SARS-CoV-2 than that of hACE2, and showed a free binding energy lower than −50.1326 kJ·mol−1. However, domestic cat ACE2 had a slightly higher free binding energy than that of hACE2, with a value of −48.8741 kJ·mol−1. Notably, the ACE2 of Canidae, including domestic dogs, and also Mustela putorius, had much higher free binding energy than that of hACE2, thus indicating much weaker binding.
However, as shown in Table 2, in species more distantly related to humans, including Mesocricetus auratus and Crocuta crocuta, the ACE2 receptors showed stronger binding toward the RBD of SARS-CoV-2 than did hACE2. The sequence similarity of Rodentia ACE2s and hACE2 is essentially 81–86%, and in rats and mice, the ability of ACE2 to bind the RBD was significantly weaker than that in humans; however golden hamster ACE2 had greater binding ability than hACE2.
Paguma larvata was confirmed to be the main intermediate host of SARS-CoV [1]. Our prediction results showed that Paguma larvata and Erinaceus europaeus ACE2 had similar binding ability to that of hACE2, thus indicating that these two species are susceptible to SARS-CoV-2. Erinaceus europaeus ACE2 has only 79.01% sequence similarity with hACE2, but its binding ability to the RBD was very close to that of hACE2.
Rhinolophus pearsonii and Rhinolophus macrotis, belonging to Rhinolophidae, have 81.37% and 80.87% sequence similarity with hACE2, but had a binding ability toward RBD similar to that of hACE2. Rhinolophus sinicus ACE2 shares 80.62% sequence similarity with hACE2, but its binding ability to RBD was stronger than that of hACE2.
These findings suggest that SARS-CoV-2 is similar to bat coronavirus and has the most similar codon usage bias to that of snake coronavirus [14]. However, much controversy exists regarding this conclusion. Therefore, we focused on the possibility of non-mammals as intermediate hosts. As shown in Table 2, the sequence similarity of non-mammalian (Phasianidae, Struthionidae, Elapidae, Phasianidae) ACE2 and hACE2 was only 55–66%, and all had weak binding ability toward the SARS-CoV-2 RBD. These results indicated that non-mammals (reptiles and birds) cannot be the intermediate hosts of SARS-CoV-2.
To better compare the natural and intermediate hosts of the bat coronavirus RaTG13 and the human SARS-CoV-2 virus, we docked the spike RBD of the bat coronavirus RaTG13 with ACE2 protein from different species and calculated the binding free energy (Table 2). The binding energy of RaTG13 RBD to ACE2 from various animals was similar to that of human SARS-CoV-2 virus (Table 2). Moreover, the binding ability of ACE2 toward the RaTG13 RBD in almost all species was weaker than that toward SARS-CoV-2, in agreement with previously reported research [18]. The results showed that the sequence similarity of Capra hircus ACE2 to hACE2 is 81.74%, and its binding ability toward RaTG13 RBD was even stronger than that of hACE2. In addition, ACE2 of Rhinolophus macrotis has 80.87% sequence similarity with hACE2, but its binding ability to RaTG13 RBD was comparable to that of hACE2. Therefore, Rhinolophus macrotis may be the intermediate host of RaTG13. In addition, Mesocricetus auratus, Jaculus jaculus, Ovis aries, Heterocephalus glaber and Phodopus campbelli had strong binding energy toward the RBD of RaTG13.
Further analysis of the binding ability of ACE2 from various animals toward the RBD of pangolin-CoV GD was performed (Table 2), and we found that Capra hircus, Mesocricetus auratus, Homo sapiens and most primates, Marmota marmot, Ictidomys tridecemlineatus, Oryctolagus cuniculus, Marmota flaviventris, Jaculus jaculus, Phodopus campbelli, Ovis aries, Grammomys surdaster, Pteropus alecto, Sus scrofa, Dasypus novemcinctus and Myotis brandtii ACE2 had stronger binding ability toward the RBD of pangolin-CoV GD than that of Manis javanica, the putative intermediate host of SARS-CoV-2, particularly Capra hircus. These species may also be susceptible to pangolin-CoV GD.
DISCUSSION
The COVID-19 pandemic caused by the novel coronavirus SARS-CoV-2 has spread worldwide. Viruses cannot grow and replicate independently, and can replicate themselves only in a host’s living cells. Previously, researchers have suggested that bats might be natural hosts of SARS-CoV-2, and snakes, pangolins, turtles and minks might be potential intermediate hosts [7–9,12–13]. It is important to find the intermediate host of SARS-CoV-2 to enable the source to be cut off and prevent the virus from being transmitted to humans. However, to confirm the intermediate host, a rigorous scientific process is necessary, as follows: (1) a virus that can reproduce continuously in the intermediate host must be isolated; (2) the disease and pathological characteristics of the isolated virus must be confirmed in animal models; (3) the position of the intermediate host in the infection transmission chain must be confirmed. At present, the intermediate host of SARS-CoV-2 is uncertain.
The binding affinity of the SARS-CoV-2 spike to hACE2 and the ability to escape host immune attack are prerequisites for cross-species transmission to humans. The interaction between the spike protein and ACE2, the first step in viral invasion of the host, directly determines the host range and tissue specificity. To explore possible susceptible animals and intermediate hosts of SARS-CoV-2, we selected 82 representative ACE2 sequences from the 1000 sequences with the closest homology to the hACE2 protein. Most of these species were mammals, and some were birds and reptiles. Through sequence alignment and phylogenetic tree analysis, these species were divided into different families, and the ACE2 proteins of all species were subjected to homology modeling. The spike RBD of SARS-CoV-2 was docked with different ACE2 proteins, and the binding free energy was calculated. The results showed that lower the homology between the ACE2 of the different species and hACE2 was associated with weaker binding ability of the ACE2 receptor to the RBD domain of SARS-CoV-2.
Previous studies have suggested that SARS-CoV-2 might have originated in bats [6,7]. We found that Rhinolophus sinicus ACE2 had slightly stronger binding than hACE2 to the SARS-CoV-2 RBD. This result suggests that Rhinolophus sinicus might be susceptible to SARS-CoV-2 and could even be the intermediate host, in agreement with a suggestion by a previous study [6].
Our results showed that the ACE2 of most primates, Crocuta crocuta, Mesocricetus auratus and wild felines had stronger binding to the RBD domain of SARS-CoV-2 than did hACE2, thus implying that these animals might be intermediate hosts of SARS-CoV-2. Most primates, including Gorilla gorilla, Macaca nemestrina, Macaca fascicularis, Macaca mulatta (Rhesus macaques) and Papio anubis are suggested to be susceptible to SARS-CoV-2, in agreement with findings from a previous study indicating that conjunctival infection of SARS-CoV-2 can cause mild COVID-19 in rhesus monkeys [19]. We found that the Aotus nancymaae ACE2 has high homology with hACE2, but its binding ability toward the SARS-CoV-2 RBD was much lower than that of hACE2 and even lower than that of ACE2 from some birds. This finding might have been due to the replacement of Tyr41 and Gln42 of hACE2 by His41 and Glu42 in Aotus nancymaae (Fig 3). Tyr41 of hACE2 forms hydrogen bonds with Thr500 and Asn501 of the SARS-CoV-2 RBD. Moreover, Glu42 forms hydrogen bonds with Gly446 and Tyr449 of the SARS-CoV-2 RBD. Tyr41 and Gln42 are also highly conserved in other species [20]. The differences in the Aotus nancymaae ACE2 might disrupt the hydrogen-bonding interactions and affect the binding affinity toward the SARS-CoV-2 RBD. In one study, the New World monkey ACE2 with His41 and Glu42 has shown limited ability to mediate SARS-CoV-2 entry, in agreement with our conclusions [21]. Tyr41 and Gln42 of ACE2 are critical to SARS-CoV-2 host range and susceptibility.
On the basis of our findings, most wild felines are likely to be susceptible to SARS-CoV-2. However, given the relatively unlikely contact between wild felines and humans, they are unlikely to be intermediate hosts. Domestic cat (Felis catus) ACE2 showed weaker binding than that of wild felines, but because its free energy remained close to that of hACE2, cats might be susceptible to SARS-CoV-2. In one study, ectopic expression of homologous ACE2 in A549 cells infected with SARS-CoV-2 was used to evaluate the binding affinity to ACE2. Cat ACE2 has shown a strong ability to mediate viral entry [21]. Another study has evaluated the invasion and replication of SARS-CoV-2 in cats and found that SARS-CoV-2 can effectively replicate in cats and spread through the air [22]. These findings indicate that cats are highly sensitive to SARS-CoV-2. Felidae with higher binding affinity ACE2, such as Panthera pardus, Lynx pardinus, Panthera tigris and Puma concolor, might also be susceptible to SARS-CoV-2. In contrast, dogs appear to be much less susceptible, in agreement with findings from previous studies [21–23].
ACE2 in animals such as Paguma larvata, Erinaceus europaeus, Erinaceus europaeus, Bos mutus, Ovis aries, Capra hircus and Sus scrofa had slightly higher binding energy toward SARS-CoV-2 RBD than did hACE2. Because the values were very close, we speculate that these animals might be susceptible to SARS-CoV-2 and could all be potential intermediate hosts.
A recent study has shown that the RBD of pangolin-CoV GD and SARS-CoV-2 is highly conserved, with a difference of only one amino acid, thus suggesting that Manis javanica could be the intermediate host of SARS-CoV-2 [7,9]. However, our docking results showed that the binding affinity of SARS-CoV-2 RBD to pangolin ACE2 was not as strong as that of hACE2. Furthermore, analysis of the binding ability of ACE2 from various animals to the RBD of pangolin-CoV GD revealed that the ACE2 of humans and many animals other than Manis javanica, such as Capra hircus, Mesocricetus auratus, and Marmota marmota, had stronger binding ability to the RBD of pangolin-CoV GD, particularly Capra hircus. This means that pangolin-CoV GD may be able to invade other species besides pangolin. Although pangolins might not be direct intermediate hosts of SARS-CoV-2, the high homology of the RBD domain between SARS-CoV-2 and pangolin-CoV GD suggests that pangolin-CoV GD might still be the intermediate virus linking SARS-CoV-2 and its earlier variants. Some studies have also suggested that SARS-CoV-2 might have integrated multiple viruses during its evolution, and pangolins might have been an intermediate host of SARS-CoV-2 [24]. We speculate that pangolin-CoV GD might have evolved with mutations when it spread in its intermediate host, then gained the ability to infect humans and primates.
This present study and previous research [21] together indicate that ferret ACE2 has significantly strong binding energy toward the SARS-CoV-2 RBD. Interestingly, ferrets are susceptible to SARS-CoV-2 without developing severe disease [21,22]. Some previous articles have claimed that snakes might be an intermediate host [25], but this conclusion lacks consideration. Because synonymous codon usage bias analysis is not suitable for studying coronavirus hosts. Our results indicated that ACE2 in reptiles, such as Ophiophagus Hannah and Crocodylus porosus, and birds, such as Phasianus colchicus and Meleagris gallopavo, showed significantly lower binding toward the SARS-CoV-2 spike RBD than mammalian ACE2; therefore, they are unlikely to be the intermediate hosts of SARS-CoV-2.
From the perspective of experimental animals, ferrets, guinea pigs and wild murine species are not good models of SARS-CoV-2, because their ACE2 showed much lower binding to the spike RBD than did hACE2. Primates and golden Syrian hamsters are more suitable experimental animals to serve as SARS-CoV-2 infection models. Existing studies have indicated that golden Syrian hamsters are susceptible to SARS-CoV-2 and exhibit pathological features similar to those of mild human infections. Thus, the golden Syrian hamster might be a potential animal model for studying SARS-CoV-2 spread, pathogenesis, drug and vaccine development [26,27]. Our results also indicated that ACE2 in mice, rats, and dogs had weaker affinity for the SARS-CoV-2 RBD, and thus these animals are not suitable SARS-CoV-2 research models. This conclusion is consistent with the results of a study showing that dogs are not susceptible to SARS-CoV-2 [22]. Because of the low binding efficiency of murine ACE2 to the SARS-CoV-2 spike, the virus cannot easily enter murine cells and cause similar symptoms to those in humans. Attempts have been made to develop transgenic mice expressing hACE2 for SARS-CoV-2 research, but some problems still exist, such as low hACE2 expression, limited tissue distribution and low lethality [28].
Investigations on SARS-CoV-2 susceptible animals that are in close contact with humans are aiming not only to find potential intermediate hosts but more importantly to block the cross-species transmission of SARS-CoV-2 and cut off the bidirectional spread and evolution. SARS-CoV-2, which broke out among farmed minks last year, first infected humans before spreading and evolving in minks, and it has continued to spread among humans [29]. Recent research has found that wild white-tailed deer in the northeastern United States have been infected with SARS-CoV-2, which has spread among deer herds. This is the first report that wild animals have been widely exposed to, and have spread, SARS-CoV-2, but how the virus spread to deer and whether it will continue to spread to other wild species are unknown [30]. Such cross-species transmission and evolution is very dangerous, because humans might have difficulty escaping the coronavirus and could be affected for a long time. Especially, when animals that are in close contact with humans, such as cats and minks, can be infected by SARS-CoV-2.
CONCLUSIONS
The traceability of SARS-CoV-2 is an extremely important but challenging task. Identifying intermediate hosts and natural hosts is time consuming but necessary to answer a series of questions about how SARS-CoV-2 evolved, how it spread to humans, and how to obtain adaptive mutations, how to increase the affinity with the host receptor, and how to evade the host immune response. Without answers to these questions, discovering SARS-CoV-2 susceptible species and preventing the virus from spreading to humans would be impossible. Our work preliminarily predicts the susceptibility of different species to SARS-CoV-2 by calculating the binding affinity of RBD and ACE2. The susceptible animals predicted herein are consistent with the currently identified SARS-CoV-2 infected animals, such as cats, minks, lions and tigers. In addition, most primates, spotted hyenas, golden Syrian hamsters, hedgehogs and sheep might be susceptible to SARS-CoV-2. These results emphasize the need to continue and expand wildlife surveillance to avoid widespread cross-species viral transmission and evolution. When necessary, rigorous zoonotic disease surveillance plans should be formulated to clarify how the pathogen adapts, evolves and spreads when it invades a new host. From a long-term perspective, we must continue to establish and perfect the strategy of prevention, control and treatment of zoonotic infectious diseases and prepare for the next potential pandemics.