Low genetic diversity may be an Achilles heel of SARS-CoV-2

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Scientists worldwide are racing to develop effective vaccines against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of the COVID-19 pandemic. An important and perhaps underappreciated aspect of this endeavor is ensuring that the vaccines being developed confer immunity to all viral lineages in the global population. Toward this end, a seminal study published in PNAS (1) analyzes 27,977 SARS-CoV-2 sequences from 84 countries obtained throughout the course of the pandemic to track and characterize the evolution of the novel coronavirus since its origination. The principle conclusion reached by the authors of this work is that SARS-CoV-2 genetic diversity is remarkably low, almost entirely the product of genetic drift, and should not be expected to impede development of a broadly protective vaccine. Although errors introduced during genome replication are a major source of genetic variation in all virus populations, limiting the fitness costs of accumulated errors is especially critical for coronaviruses, the RNA genomes of which are the largest known. For this reason, coronaviruses evolved nonstructural protein 14 (nsp14), which accompanies viral replicases during RNA synthesis and excises misincorporated ribonucleotides from nascent strands before they can be extended, thus preventing errors from becoming permanent. This error-correcting capacity was unknown among RNA viruses prior to its discovery in SARS-CoV-1 (2, 3), and it contributes to a replication error rate more than 10-fold lower than that of other RNA viruses (4, 5). This activity also likely contributes to the low genetic diversity of SARS-CoV-2, although to our knowledge nsp14 function in the novel coronavirus has yet to be investigated. For many viruses, surface glycoproteins contain not only elements required for specific binding of cellular receptors, membrane fusion, and virus entry into the host cell but also epitopes recognized by neutralizing antibodies produced as part of an effective adaptive immune response. Hence, tracking genetic variation in the SARS-CoV-2 surface glycoprotein is of paramount importance for determining the likelihood of vaccine effectiveness or immune escape. To put this variation in perspective, Fig. 1 shows a graphical illustration of comparative genetic diversity among surface glycoproteins of select human pathogenic viruses, including SARS-CoV-2, correlated with the availability and effectiveness of respective preventive vaccines. Fig. 1. Comparative genetic diversity among coronaviruses and select viral pathogens. As indicated by the scale bar, sphere radius reflects average pairwise distances (APD) of viral surface glycoprotein gene sequences among different viruses. Diversities among coronaviruses (for which no vaccines have been developed to date) are indicated in red, and those of other viruses for which effective vaccines are available or unavailable are shown in blue and green, respectively. Since 2005, the average effectiveness of combination influenza seasonal vaccines (influenza A: H1N1, H2N3, influenza B) has been 40%. Accordingly, genetic diversity of influenza A is depicted by blue-green shading to reflect an intermediate level of vaccine effectiveness. Sequences were obtained from public databases and identical sequences were included only once. MEGA7 software was used to calculate APD among gene segments encoding proteins involved in attachment/entry: Spike or Spike-like human coronaviruses (SARS-CoV-2, 229E, NL63, OC43, and HKU1), spike glycoprotein (Ebola), HN (mumps), S (HBV), H (measles), Env (HIV-1), HA (influenza A), and E1 (HCV). More specifically, HIV-1 Group M subtypes A–D, F–H, J–K, CRF01_AE, and CRF02_AG; HBV serotypes A–H; HCV genotypes 1a–c, 2a–b, 4a, 5a, 6a, 6k, and 6m; and influenza A H1N1 pdm09, seasonal H1N1, H3N2, and H5N1 were included. Majority-rule consensus of unique sequences for HIV-1 (Group M, N, O, and P), HBV, HCV, and influenza A was performed in Seaview v4.7. Total numbers of sequences analyzed: SARS-CoV-2 (21,554), 229E (25), NL63 (52), OC43 (79), HKU1 (38), Ebola (578), mumps (341), HBV (10,271), measles (38), HIV-1 (5,603), influenza A (133), and HCV (439). Although genetic diversity is only one of many determinants of vaccine efficacy, there is a clear inverse correlation between these two metrics among viral pathogens examined in our analysis. Presumably due to its relatively recent origins, genetic diversity in the SARS-CoV-2 surface glycoprotein, spike, encoded by the S gene, is exceedingly low, even in comparison to other human coronaviruses. Toward the opposite extreme, diversity among influenza A surface glycoproteins is 437-fold greater than that measured in SARS-CoV-2. The relative age of influenza A (dating at least back to the 16th century) is certainly a major factor in this disparity, as is reassortment of genome segments encoding influenza A surface antigens hemagglutinin (HA) and neuraminidase (NA) (6). Indeed, sudden emergence of influenza A virus variants containing HA–NA combinations not previously encountered by contemporaneous human populations caused the pandemics of 1918 (H1N1), 1957 (H2N2), 1968 (H3N2), and 2009 (H1N1pdm09). Although coronavirus genomes are not segmented like those of influenza viruses, they are nevertheless capable of high rates of recombination. Hence, future emergence of new virulent derivatives of SARS-CoV-2 paralleling those observed with influenza A is a possibility that will require global monitoring of both animal and human reservoirs. As differences in biology and epidemiology among these human viral pathogens are considerable, so is the extent of sequence divergence in genes encoding their respective envelope glycoproteins. HIV-1, for example, has fueled the AIDS pandemic for more than 40 y, during which time genetic diversity was acquired through both recombination and propagation of replication errors (7). Similarly, widespread sustained prevalence contributed to genetic diversity in hepatitis B virus (HBV) (8) and hepatitis C virus (HCV) (9), both causative agents of ongoing chronic hepatitis pandemics. Since these viruses cause chronic infections, their evolution is also shaped by immune pressure to a degree not possible with SARS-CoV-2, given the typical short course of COVID-19. However, with respect to our analysis, it is perhaps most important to recognize that the genetic diversities of human coronaviruses (i.e., 229E, NL63, OC43, HKU1, and now SARS-CoV-2), some of which may have been circulating in the population for centuries, are less than or comparable to those measured for mumps, measles, hepatitis B, and Ebola viruses, against which vaccines have been developed that are at least 88% effective (https://www.cdc.gov/vaccines/). The measured and well-supported conclusions of Dearlove et al. (1) markedly contrast with an early study of SARS-CoV-2 evolution that raised alarm at the emergence and spread of a “strain” more “aggressive” than the original (10). It was argued that the novel coronavirus population was divided into S and L “strains” distinguishable by two mutations at genome positions 8,782 (ORF1ab) and 28,144 (ORF8). In an addendum, the authors acknowledged that they provided no evidence supporting any epidemiological conclusion regarding the virulence or pathogenicity of SARS-CoV-2, and that their description of the “L type” as being more “aggressive” was inappropriate. That word was omitted from the subsequent print version of the article, each instance being replaced by a variation of “more frequently observed.” Unfortunately, online reports derived from this article were not as self-correcting or restrained, using phrases or titles such as “At least eight strains of the coronavirus are making their way around the globe, creating a trail of death and disease that scientists are tracking by their genetic footprints” (11), “the coronavirus is continuously mutating to overcome the immune system resistance of different populations” (12), and “Coronavirus: Are there two strains and is one more deadly?” (13) to describe and interpret the scientific findings presented in the aforementioned paper. It is hard to argue that these reports accurately portrayed the means, degree, and consequences of low-level accumulation of genetic diversity in SARS-CoV-2 to the public, and we hope such information is relayed more carefully and conscientiously in the future. Despite the remarkable wealth of data currently available, careful temporally and geographically resolved analyses of genetic diversity in large SARS-CoV-2 datasets do not always produce consensus. One recent concern has been the basis for emergence of a mutation encoding a D614G amino acid substitution in the SARS-CoV-2 spike protein. First observed in Germany in late January 2020, this variant is now the dominant form among SARS-CoV-2 viruses worldwide. Korber et al. recently concluded that the ascendency of 614G was not a consequence of genetic drift but instead occurred because the mutation renders the virus more infectious (14). This conclusion was initially based on their observation that the proportion of sequences carrying the D614G mutation progressively increased in every region in Asia, Europe, Oceania, and North America that was well-sampled in the GISAID database (https://www.gisaid.org/). Moreover, subsequent analyses showed that pseudotyped virus containing the 614G mutation spread more rapidly in cell culture, probably due to a structural alteration that reduced shedding of the S1 spike protein subunit (14 –16). Dearlove et al. (1) acknowledge that emergence of the 614G mutation may constitute an exception to their overarching conclusion that SARS-CoV-2 genetic variation is overwhelmingly due to genetic drift. However, as a caveat to accepting this determination prematurely, they cite a parallel finding that A82V and other mutations in the ebolavirus surface glycoprotein were associated with increased infectivity. In this case, subsequent analysis in cell culture showed that the degree of increased infectivity varied with cell type (17) and no phenotypic differences were observed when mutant viruses were evaluated in animal models (18). Moreover, the authors argue that because the 614G variant has relatively rarely been sampled in China, and there is no evidence for convergent evolution independently producing the same or a similar mutation, the hypothesis that 614G emerged as a consequence of a genetic bottleneck during spread of the virus from Asia to Europe remains viable. It is perhaps even more important to note that the question of whether the 614G mutation increases infectivity has no bearing on the expected efficacy of vaccines currently under development. Indeed, amino acid position 614 is not located within the receptor binding domain, the motif expected to house epitopes most frequently recognized by neutralizing antibodies, and cell culture studies confirm that viruses pseudotyped with 614D or 614G spike variants are neutralized with equal effectiveness (19, 20). Taken together, these results are consistent with the central conclusion of Dearlove et al. (1) that the current state of SARS-CoV-2 genetic diversity should not be expected to impede development of a broadly protective vaccine. It could be argued that maintaining the ∼30-kb RNA genome of SARS-CoV-2 reduces its tolerance for genetic diversity, rendering the novel coronavirus perhaps more susceptible to control by widespread immunization than might be expected for other RNA viruses. However, it is equally valid to suggest that because SARS-CoV-2 has infected and spread within an immunologically naïve population it has yet to experience the sort of immune pressure that helped shape the evolution of the endemic viruses shown in Fig. 1, and its own capacity to evolve remains unknown. Accordingly, we must continue to be diligent in tracking genetic changes in the novel coronavirus, both to follow their spread and quickly identify antigenic shifts should they occur. Yet, it is equally important to recognize that what we have observed to this point is slow genetic drift characteristic of a virus with a highly stable genome and to keep these and future observations on SARS-CoV-2 genetic diversity in the appropriate perspective, especially when communicating them to the general public.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus

B Korber, W.M. Fischer, S. Gnanakaran … (2020)

Summary A SARS-CoV-2 variant carrying the Spike protein amino acid change D614G has become the most prevalent form in the global pandemic. Dynamic tracking of variant frequencies revealed a recurrent pattern of G614 increase at multiple geographic levels: national, regional and municipal. The shift occurred even in local epidemics where the original D614 form was well established prior to the introduction of the G614 variant. The consistency of this pattern was highly statistically significant, suggesting that the G614 variant may have a fitness advantage. We found that the G614 variant grows to higher titer as pseudotyped virions. In infected individuals G614 is associated with lower RT-PCR cycle thresholds, suggestive of higher upper respiratory tract viral loads, although not with increased disease severity. These findings illuminate changes important for a mechanistic understanding of the virus, and support continuing surveillance of Spike mutations to aid in the development of immunological interventions.

0 comments Cited 2094 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity

Qianqian Li, Jiajing Wu, Jianhui Nie … (2020)

Summary The spike protein of SARS-CoV-2 has been undergoing mutations and is highly glycosylated. It is critically important to investigate the biological significance of these mutations. Here we investigated 80 variants and 26 glycosylation site modifications for the infectivity and reactivity to a panel of neutralizing antibodies and sera from convalescent patients. D614G, along with several variants containing both D614G and another amino acid change, were significantly more infectious. Most variants with amino acid change at receptor binding domain were less infectious but variants including A475V, L452R, V483A and F490L became resistant to some neutralizing antibodies. Moreover, the majority of glycosylation deletions were less infectious whilst deletion of both N331 and N343 glycosylation drastically reduced infectivity, revealing the importance of glycosylation for viral infectivity. Interestingly, N234Q was markedly resistant to neutralizing antibodies, whereas N165Q became more sensitive. These findings could be of value in the development of vaccine and therapeutic antibodies.

0 comments Cited 793 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

On the origin and continuing evolution of SARS-CoV-2

Xiaolu Tang, Changcheng Wu, Xiang Li … (2020)

ABSTRACT The SARS-CoV-2 epidemic started in late December 2019 in Wuhan, China, and has since impacted a large portion of China and raised major global concern. Herein, we investigated the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses. Although we found only 4% variability in genomic nucleotides between SARS-CoV-2 and a bat SARS-related coronavirus (SARSr-CoV; RaTG13), the difference at neutral sites was 17%, suggesting the divergence between the two viruses is much larger than previously estimated. Our results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by mutations and natural selection besides recombination. Population genetic analyses of 103 SARS-CoV-2 genomes indicated that these viruses evolved into two major types (designated L and S), that are well defined by two different SNPs that show nearly complete linkage across the viral strains sequenced to date. Although the L type (∼70%) is more prevalent than the S type (∼30%), the S type was found to be the ancestral version. Whereas the L type was more prevalent in the early stages of the outbreak in Wuhan, the frequency of the L type decreased after early January 2020. Human intervention may have placed more severe selective pressure on the L type, which might be more aggressive and spread more quickly. On the other hand, the S type, which is evolutionarily older and less aggressive, might have increased in relative frequency due to relatively weaker selective pressure. These findings strongly support an urgent need for further immediate, comprehensive studies that combine genomic data, epidemiological data, and chart records of the clinical symptoms of patients with coronavirus disease 2019 (COVID-19).

0 comments Cited 708 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Proc Natl Acad Sci U S A

Journal ID (iso-abbrev): Proc Natl Acad Sci U S A

Journal ID (hwp): pnas

Journal ID (pmc): pnas

Journal ID (publisher-id): PNAS

Title: Proceedings of the National Academy of Sciences of the United States of America

Publisher: National Academy of Sciences

ISSN (Print): 0027-8424

ISSN (Electronic): 1091-6490

Publication date (Print): 6 October 2020

Publication date (Electronic): 21 September 2020

Publication date PMC-release: 21 September 2020

Volume: 117

Issue: 40

Pages: 24614-24616

Affiliations

[1] ^aHIV Dynamics & Replication Program, Center for Cancer Research, National Cancer Institute at Frederick , Frederick, MD 21702;

[2] ^bMicrobiology and Immunology Department, Georgetown University , Washington, DC 20007

Author notes

¹To whom correspondence may be addressed. Email: kearneym@ 123456mail.nih.gov .

Author contributions: M.F.K. designed research; J.W.R., A.A.C., and S.C.P. analyzed data; and J.W.R., A.A.C., M.G.K., S.C.P., and M.F.K. wrote the paper.