INTRODUCTION
Crimean-Congo hemorrhagic fever virus (CCHFV) has the most extensive geographic range among medically important tick-borne viruses and is endemic to 30 countries in western Asia, southeast Europe, the Middle East, and Africa [1]. An estimated 3 billion people are at risk of CCHFV infection [2]. Human cases of Crimean-Congo hemorrhagic fever (CCHF) can result in severe disease, including hemorrhage, multi-organ failure, shock, and death, with associated case fatality rates as high as 30% [1,3,4]. Consequently, CCHFV poses a high risk to public health and has been classified as a priority pathogen for research and development by the World Health Organization, and as a Biodefense Category A pathogen by the United States National Institutes of Health [5]. Despite decades of vaccine development research for CCHFV, no approved vaccine is widely available for human use. The only vaccine available for human use is not licensed by the FDA and is used only in Bulgaria, because of safety concerns associated with this inactivated mouse brain preparation [1].
CCHFV contains a tri-segmented, negative sense, single-stranded RNA genome, and is classified within the family Nairoviridae, genus Orthonairovirus [6]. The three segments, known as the small (S), medium (M), and large (L) segments, encode the nucleoprotein (NP), glycoprotein precursor (GPC), and RNA-dependent RNA polymerase (RdRp), respectively. Vaccine development for CCHFV has focused primarily on the use of the GPC or the NP in various platform technologies, and most vaccine candidates encoding the glycoproteins have demonstrated protection against viral challenge. CCHFV is the most genetically diverse arbovirus, and the M segment demonstrates the largest nucleotide diversity among the three viral segments, at 31% [1]. Despite the substantial divergence of this segment, the M segment has been the most explored for antibody and T-cell epitope mapping, epitope prediction, and vaccine development for CCHFV, because it encodes the structural glycoproteins present on the virion surface that induce both cellular and humoral immune responses. The CCHFV GPC undergoes the most extensive cleavage and processing among viruses in the order Bunyavirales, thus forming the two structural glycoproteins GN and GC, and the three nonstructural glycoproteins mucin-like domain (MLD), GP38, and NSM [7]. The complex processing of the GPC involves N-linked glycosylation, formation of disulfide bonds among many cysteine residues, and extensive O-linked glycosylation [8]. The complicated processing of the GPC, which requires the nonstructural proteins for the proper maturation of the structural glycoproteins, has led to the use of the whole GPC as a common vaccine antigen.
Correlates of protection against CCHFV have yet to be defined [9]. However, T cells are known to play an important role in CCHFV immunity and to be necessary for survival after CCHFV infection [10,11]. Depletion of CD4+ and/or CD8+ T-cells exacerbates morbidity and mortality during acute CCHFV infection; vaccination using the whole GPC can robustly activate T-cells; and human survivors of CCHF have long-lived CD8+ T-cell responses [10–16]. T-cell immunogenicity is not evenly distributed across the GPC, because certain regions generate stronger recall responses than others. These recall responses have also been shown to vary depending on the CCHFV strain of the stimulating peptide pool, wherein peptides from a strain heterologous to the immunizing strain show significantly diminished recall responses [15]. This pattern is also seen with antibody binding assays using GPC peptides [17]. Therefore, specific humoral or cellular responses might be induced by vaccination with specific GPC regions.
The size, complex processing, and uneven distribution of immunogenicity across the CCHFV GPC make it a strong candidate for development of vaccines by using a multi-epitope antigen rather than the whole GPC [18–21]. Previous research has demonstrated the feasibility of a multi-epitope vaccine development strategy for other members of Bunyavirales with complex glycoprotein processing; a multi-epitope DNA vaccine generated with conserved epitopes selected from an alignment of the Hantaan virus, Seoul virus, and Puumala virus glycoproteins has been found to induce both humoral and cellular immunity against all three viruses in mice [22]. Ideally, this strategy for the development of a CCHFV multi-epitope antigen would provide protection against diverse CCHF viruses, thus addressing a shortcoming of previous CCHFV vaccine candidates.
Previous studies attempting to develop multi-epitope vaccines for CCHFV have been limited to a “string of beads” approach linking several individual epitopes from the NP, structural glycoproteins, or RdRp, that are recognized only by specific major histocompatibility complex (MHC) alleles. In contrast, we sought to demonstrate that immunoinformatic analyses predicting cytotoxic T-lymphocyte (CTL) and helper T-lymphocyte (HTL) epitopes in the CCHFV GPC can be combined with regions described to generate T-cell recall responses in the literature to generate a multi-epitope antigen composed of large GPC regions including numerous predicted epitopes recognized by a variety of MHC alleles. We hypothesized that, in future studies, generation of this type of multi-epitope antigen might generate the robust T- and B-cell responses required for protection against lethal CCHFV challenge [10,12,14–16,23]. Regions of the GPC with multiple predicted epitopes were selected for generation of a model multi-epitope antigen, and the residue homology of each region was compared with the homology of whole GPC proteins from M segment sequences representing the widespread geographical and ecological distribution of the virus. Finally, a multi-epitope antigen was constructed in silico, and the subcellular localization and antigenicity of the protein was predicted with bioinformatic servers. This work provides new information for potential targets during CCHFV vaccine development.
METHODS
Evaluation of CCHFV GPC residue diversity across clades
The amino acid sequence of the CCHFV GPC from strain Turkey200406546, designated throughout this work as Turkey2004 (Accession #KY362519), was used as a reference sequence for alignments; calculation of percentage identity and percentage similarity; and prediction of CTL and HTL epitopes. This strain of CCHFV has not previously been used for in silico analysis of epitope prediction, despite the high annual incidence of CCHFV cases in Turkey [1]. Fifty GPC sequences were selected as representative sequences for the widespread geographical and ecological distribution of the virus, with sequences from each CCHFV clade, and from ticks, animals, and clinical cases. A phylogenetic tree was constructed with Geneious Tree Builder with default parameters, and clades were assigned on the basis of previous publications [1,24]. Each full-length GPC sequence was aligned to the Turkey2004 sequence, and individual alignments were made for each GPC protein (MLD, GP38, GN, NSM, and GC) in Geneious Prime software (version 2021.1.1). The percentage identity and percentage similarity of each selected sequence to the Turkey2004 sequence was determined with William Pearson’s lalign program run through the Swiss Institute of Bioinformatics ExPASy Bioinformatics Resource Portal (now available through the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) https://www.ebi.ac.uk/Tools/psa/lalign/), wherein sequence identity considers only the residues that strictly match between two sequences, and sequence similarity considers residues that exactly match and the similarity the physicochemical properties of the residues that differ.
CTL epitope prediction
The translation of the GPC of CCHFV strain Turkey2004 was used for bioinformatic server predictions to identify epitopes likely to be presented by human MHC class I molecules to CD8+ cytotoxic T-lymphocytes. CTL epitopes were predicted with the NetCTL 1.2 Server (https://services.healthtech.dtu.dk/service.php?NetCTL-1.2) powered by the Department of Bio and Health Informatics at the Technical University of Denmark. The NetCTL 1.2 Server was used to predict binding of 9-mer CTL peptides to 12 MHC class I supertypes (A1, A2, A3, A24, A26, B7, B8, B27, B39, B44, B58, and B62) with neural networks. Predicted peptides were selected on the basis of an inclusion criterion of a combined score >1.0 from prediction of MHC class I binding, proteasomal C-terminal cleavage, and TAP transport efficiency. A combined score of >1.0 has 70% sensitivity and 98.5% specificity in CTL ligand prediction accuracy. Peptides were excluded if they included known cleavage sites on the GPC polyprotein.
Helper T-lymphocyte (HTL) epitope prediction
The translation of the GPC of CCHFV strain Turkey2004 was used for bioinformatic server predictions to identify epitopes likely to be presented by human MHC class II molecules to CD4+ helper T-lymphocytes. HTL epitopes were predicted with the NetMHCII 2.3 Server (https://services.healthtech.dtu.dk/service.php?NetMHCII-2.3) powered by the Department of Bio and Health Informatics at the Technical University of Denmark. The NetMHCII 2.3 Server was used to predict binding of 15-mer HTL peptides to 25 HLA-DR, 20 HLA-DQ, and 9 HLA-DP alleles with artificial neural networks. Predicted peptides were selected if they met both the inclusion criteria of (1) a strong binder threshold of <2.00% rank to a set of 1,000,000 random natural peptides and (2) a predicted IC50 value <50 nM. Peptides were excluded if they included known cleavage sites on the GPC polyprotein.
Alignment of peptides to the CCHFV GPC and selection of multi-epitope regions
All predicted CTL and HTL peptides that met the above inclusion criteria were aligned to the Turkey2004 GPC sequence with Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/). Groups of peptides were assigned numerical values for the number of peptides that overlapped a given residue in the GPC alignment, and peptides were graphed in GraphPad Prism (Version 8) with these assigned values. The only two specific T-cell epitopes that have been experimentally demonstrated from convalescent CCHF cases [13] were included in a different graphing panel for evaluation. Regions with the highest number of overlapping predicted peptides and regions including the two human T-cell epitopes were selected as multi-epitope regions.
Evaluation of conservation of selected multi-epitope regions across CCHFV clades
Sequence conservation was assessed at the residue level for both percentage identity and percentage similarity. Each of the 11 selected Turkey2004 multi-epitope epitope sequences was compared individually to the same region of 50 representative CCHFV sequences, as described previously, with William Pearson’s lalign program. The average percentage identity and percentage similarity of each multi-epitope region across the 50 sequences was compared with the average percentage identity of the GPC protein where the multi-epitope region originated.
Construction of the multi-epitope antigen and antigenicity prediction
The 11 multi-epitope regions were joined in silico with a flexible glycine-glycine-glycine-serine linker. A start codon was placed at the N-terminus, and a six residue polyhistidine tag was added before a stop codon at the C-terminus. A signal peptide was not included at the N-terminus of the antigen, to prevent localization for processing through the secretory pathway, as is the case for the GPC. The final model multi-epitope antigen, termed EPItope Construct (EPIC), with the linkers and tag, was 853 residues in length and is subsequently referred to as EPIC. Subcellular localization and the residues important for localization of EPIC were predicted with the DeepLoc-1.0 server run through the Department of Bio and Health Informatics at the Technical University of Denmark (https://services.healthtech.dtu.dk/service.php?DeepLoc-1.0). The antigenicity of EPIC and individual GPC proteins was predicted with two independent servers, ANTIGENpro (http://scratch.proteomics.ics.uci.edu/) and VaxiJen (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html).
RESULTS
Evaluation of CCHFV GPC protein diversity across clades
Fifty GPC sequences from ticks, animals, and clinical cases were selected for evaluation with representative sequences from each clade of CCHFV, spanning the widespread geographical distribution of the virus (Fig 1). The percentage identity and similarity of each CCHFV GPC protein, MLD, GP38, GN, NSM, and GC, from each sequence to the Turkey2004 sequence was calculated in William Pearson’s lalign program. The Turkey2004 sequence aligned with clade V-Europe 1, and unsurprisingly, the representative sequences from this clade showed the highest homology to the Turkey2004 sequence. Sequences from clades I-IV and VI displayed greater differences in percentage identity and similarity to the Turkey2004 sequence. A gradient in the homology of the GPC was observed, with low homology at the N-terminus and increasing homology toward the C-terminus of the GPC. The structural glycoproteins GN and GC displayed the highest average percentage identity and similarity across the six clades. The N-terminal nonstructural proteins MLD and GP38 displayed the lowest average percentage identity and similarity across the six clades.
CTL and HTL epitope prediction and epitope alignment to the GPC
The NetCTL 1.2 Server predicted 256 CTL peptides that met the initial inclusion criteria, and the NetMHCII 2.3 Server predicted 837 HTL peptides that met the initial inclusion criteria (Table 1). These 1093 predicted epitopes were aligned to the full-length GPC in Clustal Omega. After alignment, 4 CTL ligands and 17 HTL ligands were excluded because of their location at known GPC cleavage sites, thus leaving 252 CTL ligands and 820 HTL ligands for consideration (Table 1). The number of peptides that aligned to the individual proteins was expected to be relative to the proportion of the GPC composed of the individual proteins. The individual GPC proteins MLD (249 residues), GP38 (272 residues), GN (288 residues), NSM (233 residues), and GC (647 residues) composed 14.7%, 16.1%, 17.1%, 13.8%, and 38.3% of the polyprotein, respectively. The number of peptides that aligned to the individual proteins resulted in similar proportions of 19.1%, 16.2%, 13.1%, 14.9%, and 34.8% for MLD, GP38, GN, NSM, and GC, respectively (Table 1). Interestingly, the peptides predicted for the nonstructural proteins MLD, GP38, and NSM resulted in predicted peptide proportions higher than that for the polyprotein. Simultaneously, a lower proportion of peptides was predicted for the structural glycoproteins GN and GC. The pattern of the predicted epitopes aligned to the GPC closely followed graphs of antibody reactivity data and T-cell recall responses across the CCHFV GPC [15,17], thus providing confidence in this prediction method for identifying potentially immunogenic regions of the GPC.
Number of Predicted Epitopes | |||||
---|---|---|---|---|---|
GPC Region | Type of Epitope | Total | |||
CTL | HLA-DR | HLA-DQ | HLA-DP | ||
MLD | 19 | 39 | 140 | 11 | 209 (19.1%) |
MLD-GP38* | 0 | 0 | 0 | 0 | 0 (0%) |
GP38 | 41 | 94 | 17 | 25 | 177 (16.2%) |
GP38-GN* | 1 | 0 | 0 | 0 | 1 (0.1%) |
GN | 54 | 42 | 10 | 37 | 143 (13.1%) |
GN-NSM* | 2 | 5 | 5 | 0 | 12 (1.1%) |
NSM | 40 | 43 | 66 | 14 | 163 (14.9%) |
NSM-GC* | 1 | 3 | 4 | 0 | 8 (0.7%) |
GC | 98 | 130 | 100 | 52 | 380 (34.8%) |
Total | 256 | 356 | 342 | 139 | 1093 (100%) |
Total number of CTL and HTL epitopes predicted by the NetCTL 1.2 and NetMHCII 2.3 servers. Epitopes were aligned to the Turkey2004 GPC sequence to determine their corresponding GPC region. Epitopes including cleavage sites (indicated with *) were excluded from further consideration.
Identification of multi-epitope regions
The 1072 CTL and HTL ligands that met the inclusion criteria were quantified and graphed by assigning numerical values for the number of predicted peptides that overlapped a given residue in the GPC sequence. This graphing method enabled comparison to experimentally demonstrated CD8+ T-cell epitopes (Fig 2A) [13] and identification of regions with the most predicted epitopes (Fig 2B). Eleven regions of the GPC, referred to as GPC-01 through GPC-11, which had the largest number of overlapping predicted epitopes and included the two experimentally shown human T-cell epitopes, were chosen for further evaluation (Fig 2C). The locations of the 11 multi-epitope regions are as follows, on the basis of the residue numbers of the Turkey2004 GPC sequence: residues 96–172 (GPC-01), residues 205–244 (GPC-02), residues 253–283 (GPC-03), residues 303–399 (GPC-04), residues 643–742 (GPC-05), residues 923–1007 (GPC-06), residues 1043–1105 (GPC-07), residues 1270–1325 (GPC-08), residues 1335–1435 (GPC-09), residues 1472–1529 (GPC-10), and residues 1572–1663 (GPC-11). These 11 regions included 812 (75.7%) of all predicted CTL and HTL ligands and 805 residues (47.7%) of the GPC sequence, with regions selected from all five proteins of the GPC (Table 2).
Number of Predicted Epitopes | |||||||
---|---|---|---|---|---|---|---|
Epitope Region | Length (AA) | GPC Protein | Type of Epitope | Total | |||
CTL | HLA-DR | HLA-DQ | HLA-DP | ||||
GPC-01 | 77 | MLD | 6 | 11 | 62 | 0 | 79 |
GPC-02 | 40 | MLD | 4 | 15 | 63 | 0 | 82 |
GPC-03 | 36 | GP38 | 9 | 19 | 0 | 7 | 35 |
GPC-04 | 97 | GP38 | 17 | 52 | 14 | 18 | 101 |
GPC-05 | 100 | GN | 32 | 9 | 7 | 35 | 83 |
GPC-06 | 85 | NSM | 15 | 36 | 45 | 10 | 106 |
GPC-07 | 63 | GC | 11 | 16 | 23 | 0 | 50 |
GPC-08 | 56 | GC | 8 | 2 | 10 | 0 | 20 |
GPC-09 | 101 | GC | 14 | 22 | 16 | 0 | 52 |
GPC-10 | 58 | GC | 11 | 39 | 44 | 0 | 94 |
GPC-11 | 92 | GC | 21 | 43 | 0 | 46 | 110 |
Total | 148 | 264 | 284 | 116 | 812 |
Eleven multi-epitope regions were selected from alignment of the predicted epitopes to the GPC to generate the multi-epitope antigen. The length in residues of each region, the GPC protein from which the region originated, and the number and type of predicted epitopes included in each selected region are shown. A total of 812 epitopes (75.7% of total predicted epitopes) were included within the selected regions.
Evaluation of conservation of selected multi-epitope regions across CCHFV clades
Multiple studies have demonstrated variability in immune responses to homologous or heterologous CCHFV strains [15,17,25–30]. Thus, assessment of the residue conservation of the selected GPC regions among various CCHFV sequences was necessary. Each selected multi-epitope region was evaluated for residue identity across 50 different CCHFV sequences (Fig 3). As expected from the evaluation of identity across the whole CCHFV proteins, the multi-epitope regions originating from the MLD had the lowest average percentage identity, whereas multi-epitope regions originating from the GC had the highest average percentage identity. The average percentage identity of each region was compared with that of its originating GPC protein; 5/11 regions (GPC-04, GPC-05, GPC-06, GPC-10, and GPC-11) were less conserved than their respective whole proteins, and 6/11 regions (GPC-01, GPC-02, GPC-03, GPC-07, GPC-08, and GPC-09) had conservation equal to or greater than that of their respective whole proteins.
Construction of EPIC and antigenicity prediction
To construct a model multi-epitope antigen for CCHFV, we generated EPIC in silico with all 11 selected GPC regions. These regions included three transmembrane domains in GPC-05, GPC-06, and GPC-11, originating from GN, NSM, and GC, respectively. We hypothesized that the lack of a signal peptide in the protein, and the inclusion of the transmembrane domain from the structural glycoprotein GC, would result in localization of EPIC to the cell membrane, rather than resulting in processing of EPIC through the secretory pathway in a manner similar to the full-length GPC. Localization prediction by the DeepLoc 1.0 server predicted that EPIC had a 78.7% likelihood of localizing to the cell membrane (Fig 4A). The residues most important for the predicted subcellular localization of EPIC (Fig 4B) corresponded to the three transmembrane domains included in the construct (Fig 4C). These results suggested that the inclusion of the transmembrane domains, but not signal peptide, from the GPC might result in a different predicted subcellular localization from the GPC, and the localization of future multi-epitope antigens can be modified by inclusion of different GPC regions.
The antigenicity of EPIC and the GPC proteins was predicted with two independent bioinformatic servers. ANTIGENpro predicted that EPIC was more likely to be antigenic than any of the GPC proteins alone, with a predicted value of 0.94 of 1 (MLD: 0.91; GP38: 0.56; GN: 0.73; NSM: 0.25; GC: 0.87). The VaxiJen server, which uses a threshold of 0.4 for indicating protein antigenicity, predicted that EPIC (0.49) would be less antigenic than the MLD (0.50), NSM (0.58), and GC (0.57) proteins, but more antigenic than GP38 (0.42) and GN (0.45). These results suggest that EPIC has antigenic potential; therefore, multi-epitope antigens such as EPIC should be evaluated in vitro and in vivo in future studies.
DISCUSSION
CCHF is a widespread and medically important tick-borne viral disease, yet no vaccine is widely available for human use [3]. Vaccine development strategies for CCHFV have evaluated numerous platform and antigen combinations, and the CCHFV DNA vaccines currently being explored primarily encode the full-length GPC or other whole protein-encoding sections from the GPC, including GP38, GN, and GC, owing to the complex processing of the glycoproteins [14–16,31,32]. The size and complex processing of the CCHFV GPC have been suggested to hinder the use of GPC as a vaccine antigen, and therefore the use of a multi-epitope antigen might provide improvements over current CCHFV DNA vaccine candidates [18–21]. Bioinformatic analyses for generation of a CCHFV multi-epitope vaccine have previously evaluated NP, GPC, or RdRp, for prediction of T- and B-cell epitopes, but those studies have been limited to evaluation of a single strain of CCHFV. The results presented herein are the first to provide bioinformatic epitope prediction with the Turkey2004 strain of the CCHFV GPC, evaluation of multi-epitope region homology among sequences from all CCHFV clades, and information on potential targets within the GPC for vaccine development.
Previous studies have discussed the diversity of the CCHFV M segment. However, this is the first study comparing the Turkey2004 GPC sequence to other CCHFV isolates spanning the widespread geographical distribution of CCHFV (Fig 1). Unsurprisingly, the Turkey2004 GPC displayed high similarity and identity to other isolates within Clade V-Europe 1, but higher diversity with respect to other clades. The Turkey2004 GPC sequence displayed higher homology to isolates from Asia and East Africa than to strains from West Africa, Central Africa, and the Mediterranean. This finding may be explained by migratory bird patterns, which influence the spread of Hyalomma ticks, the primary vector and reservoir of CCHFV [1,33]. These results highlight the importance of choosing specific regions of the GPC for vaccine development, given that certain regions of the GPC demonstrate higher cross-clade homology than others.
The bioinformatic analyses performed herein predicted CTL and HTL epitopes with the amino acid sequence of the GPC from strain Turkey2004, and were followed by the evaluation of residue conservation of selected GPC regions across CCHFV clades. A total of 1093 T-cell epitopes were predicted across the Turkey2004 GPC sequence (Table 1). Alignment of these predicted epitopes to the GPC revealed 21 predicted GPC epitopes that were located at experimentally demonstrated GPC cleavage sites and were excluded from consideration (Table 1). All included epitopes were quantified and graphed to identify multi-epitope regions (Fig 2). Previous studies have shown that immunogenicity is not evenly distributed across the GPC, and certain regions of the GPC generate stronger antibody binding and T-cell recall responses than others [12,15–17]. The results of antibody epitope mapping of pooled Turkish or South African CCHFV convalescent sera to linear peptide pools generated from the Turkey-Kelkit06 strain of the CCHFV GPC has demonstrated the highest reactivity within the nonstructural glycoproteins, specifically the C-terminus of MLD, N-terminus of GN, and middle of NSM [17]. Reactivity has also been observed at the C-terminus of the structural glycoprotein GC [17]. Similarly, peptides generated from the MLD and NSM glycoproteins generate robust T-cell recall responses in GPC DNA vaccinated mice and Cynomolgus macaques [15,16]. In contrast, the N-terminus of GC generates stronger T-cell recall responses than the C-terminus in animals; however, the two specific human T-cell epitopes identified with PBMCs from convalescent patients are found in the middle of GC [13]. Herein, the in silico analyses of CTL and HTL epitopes within the Turkey2004 GPC demonstrated an uneven distribution across the GPC, with certain regions displaying more predicted epitopes than others. These results confirm the published results of splenocyte stimulation with peptide pools spanning the GPC, wherein splenocytes are not uniformly stimulated across the GPC. Additionally, the graphing pattern of the predicted T-cell epitopes was similar to the results of linear antibody epitope mapping of pooled Turkish sera to peptides generated against the CCHFV strain Turkey-Kelkit06 [17]. Together, the results from the reactivity of T-cells [15,16], antibody binding [17], and prediction of epitopes across the CCHFV GPC (in the present study) [18,20,21], provide information on the regions of the GPC that are predicted to be the most immunogenic and may serve as potential targets for vaccine development: the nonstructural glycoproteins and GC.
To generate a model multi-epitope antigen in silico, we selected 11 regions of the GPC that contained the largest number of overlapping epitopes, and included previously published T-cell epitopes [13] for further evaluation (Table 2). The multi-epitope regions selected from each of the GPC proteins contained 812 (75.7%) of all predicted CTL and HTL ligands and included 805 residues (47.7%) of the GPC sequence (Table 2). Interestingly, more epitopes were predicted in the nonstructural proteins than the structural glycoproteins, relative to protein size. This finding is similar to data showing that splenocyte stimulation with peptides homologous to the vaccine strain, but not heterologous peptides, generates greater recall responses to nonstructural proteins than the structural glycoproteins [15]. We hypothesized that multi-epitope regions would demonstrate higher cross-clade homology than their whole originating GPC proteins, thereby theoretically eliminating the problem of diminished T-cell recall responses from heterologous strains. However, only 6/11 regions had conservation equal to or greater than that of their respective whole proteins (Fig 3). The MLD is the most diverse protein of the GPC (Fig 1) and was the only GPC protein in which both multi-epitope regions demonstrated residue identity and similarity across 50 CCHFV sequences that were greater than or equal to those of the whole protein (Fig 3). These results suggested that a multi-epitope antigen derived strictly from regions of the most predicted epitopes may not be able to overcome the substantial diversity of CCHFV among strains. In combination with published data showing variability in immune reactivity between strains [15,17], these data suggest that, given the widespread distribution of CCHFV, future work should focus on GPC regions with the highest residue conservation to generate a vaccine candidate that provides heterologous protection across CCHFV clades. Alternatively, the development of regionally strain-specific vaccines is worthy of consideration.
The model multi-epitope antigen EPIC was constructed in silico by connecting each selected multi-epitope region with flexible linkers. The final multi-epitope antigen was predicted to localize to the cell membrane in mammalian cells, and the included transmembrane domains were important for localization prediction (Fig 4). The protein was predicted to be antigenic by two prediction servers. These results suggest that a multi-epitope antigen for CCHFV with favorable antigenic properties can be designed in silico. This information can be used in downstream studies to modify the subcellular localization, and therefore the processing and immunogenicity of multi-epitope antigens, by inclusion of various domains of the GPC. Because a multi-epitope vaccine for Hantaan virus, Seoul virus, and Puumala virus has successfully induced both cellular and humoral immunity to each virus in mice [22], future in vitro and in vivo experiments using EPIC or other CCHFV multi-epitope antigens should be undertaken to evaluate the immunogenicity and efficacy of this type of vaccine development strategy.
The bioinformatic servers used herein predict epitopes on the basis of linear amino acid sequences for protein cleavage and binding of peptides to MHC complexes. The in silico analysis is designed to provide information about epitopes that may bind MHC complexes; however, because the complex nature of this biological process is difficult to predict, at least some or possibly many of the predicted epitopes included within EPIC are likely not to be immunogenic. Sequence variability can affect the prediction of epitopes, thus leading to vastly different prediction results. These results from epitope prediction with the Turkey2004 strain of the CCHFV GPC could be combined with epitope prediction with other CCHFV strains to provide a broader understanding of how CCHFV sequence diversity may affect immunogenicity. Given the wide range of sequence variability within some GPC proteins, epitope predictions may be improved by completing the analyses with multiple CCHFV GPC sequences from diverse strains, then choosing regions on the basis of homologously predicted peptides, rather than assessing conservation after regions are chosen from epitope predictions from a single sequence of GPC.
In conclusion, the results presented herein demonstrate that a model multi-epitope antigen with the Turkey2004 strain of the CCHFV GPC can be designed in silico by combining bioinformatic epitope predictions and T-cell immunogenicity data from the literature. The predicted antigenicity of EPIC suggests that a multi-epitope antigen based vaccine for CCHFV should be evaluated as a future vaccine development strategy for CCHFV.