Original Research
SARS-CoV-2 Epitope Presentation by Class II HLA Genotypes Common in North American Populations: A Proposed Computational Approach for Vaccine Efficacy Evaluation
Laura Leclair1, Constantin Polychronakos2,3 Published online: September 2022
1Department of Microbiology and Immunology, McGill University
2Department of Pediatrics (cross-appointment in Human Genetics), McGill University
3Research Institute of the McGill University Health Centre
Corresponding Author: Laura Leclair, email laura.leclair@umontreal.ca
DOI: 10.26443/mjm.v21i1.907
Abstract
Background: Human Leukocyte Antigen (HLA) gene polymorphisms between ethnic groups have been shown to play a role in the heterogeneity of response to SARS-CoV-2, in terms of COVID-19 disease severity and susceptibility, in addition to socioeconomic factors. It was predicted that this finding may extend to vaccine responsiveness.
Purpose: To the best of our knowledge, this study was the first that aimed to predict and evaluate the effectiveness of four COVID-19 vaccines across North American ethnic groups, in terms of their ability to trigger CD4+ T cell help, based on class II HLA allele frequencies.
Methods: : Various databases including the Immune Epitope Database (IEDB) were used in this computational approach. The number of peptide-HLA high-affinity pairs between the most common HLA II haplotypes and SARS-CoV-2 peptides in various vaccine types were retrieved and compared between ethnicities. From this, the efficiency of antigen presentation to CD4+ T cells was evaluated, a crucial component in the context of vaccination for cellular immunity and support in antibody generation.
Results: Multiple discrepancies in vaccine effectiveness for ethnic minorities relative to the Caucasian group, overrepresented in vaccine clinical trials, were highlighted. Recommendations were issued in terms of which vaccine types could be most effective for particular ethnicities.
Conclusion: There exists a genetic basis for differential responses to vaccines among ethnic groups in North America. However, given the multifactorial nature of vaccine responsiveness and limitations of computational methods, this study offers future research directions to undertake before the findings can be transferred to clinical and public health settings.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for causing Coronavirus disease 2019 (COVID-19) with infections ranging from asymptomatic to fatal. Structural proteins encoded within its genome are spike (S), envelope (E), membrane (M) and nucleocapsid (N). (1) SARS-CoV-2-specific antibodies, CD4+ T cells, and CD8+ T cells are at the core of the adaptive immune response against SARS-CoV-2. The focus of this study was on CD4+ T cells only, as they play a crucial role in natural infection and vaccination in terms of T cell help in antibody generation. They are also critical in maintaining long-term memory B cells and humoral immunity. (2)
The four main types of COVID-19 vaccines contain: whole virus (inactivated), the spike protein, the receptor-binding domain (RBD) of the spike protein, as well as spike and nucleocapsid proteins. (3-5) Most vaccine developers focus exclusively on the spike protein as it is the main target of neutralizing antibodies. (2) Yet, it has been shown that the generation of antibodies against spike relies on CD4+ T cells recognizing any component of the SARS-CoV-2 genome. (6) Grifoni et al. demonstrated via an ex vivo T cell assay that a SARS-CoV-2-specific CD4+ T cell response, correlated with antibody generation, may be triggered against almost all SARS-CoV-2 proteins. Thus, they recommended including more structural proteins in vaccines to better represent the infection observed in COVID-19. (7) This may amplify CD4+ T cell responses, in turn reinforcing humoral immunity through T cell help in generating antibodies.
CD4+ T cells recognize SARS-CoV-2 peptide-Human Leukocyte Antigen II (HLA II) complexes on the surface of antigen presenting cells to become activated and differentiate into their effector subsets. (8) The major HLA class II genes are HLA-DPA, HLA-DPB, HLA-DRA, HLA-DRB, HLA-DQA, and HLA-DQB. They encode highly polymorphic proteins, whose allele frequencies vary between ethnic groups, based on historical exposure to different pathogens. Each allele has varying binding affinity and can only bind a portion of the processed viral peptides, affecting the efficiency of antigen presentation. (9) Consequently, HLA alleles can either confer susceptibility or protection to infection or disease severity, as shown by epidemiological analyses and small-scale cohort studies of HLA typed individuals. (10-16) Heterogeneity in infection susceptibility and disease outcome can also be predicted by computational HLA and SARS-CoV-2 peptide association studies. Barquera et al. observed in a computational approach that the frequencies of strongest and weakest binders to various coronaviruses differ among populations from different geographic regions worldwide. (17) In the case of SARS-CoV-2, Copley et al. predicted similar protective CD4+ cellular immunity potential, based on HLA II haplotypes, across 25 human populations of different ethnicities in the United States. (18)
Other computational studies aiming to design the most effective multi-epitope COVID-19 vaccines predicted heterogeneity in vaccine response based on ethnicity. He et al. found that their designed vaccine may confer less protection for people of African descent based on HLA coverage. (19) Remarkably, Liu et al. designed vaccines predicted to have high coverage for Asian, Black, and White ancestry individuals. (20)
Ethnic minorities in North America are disproportionately affected by COVID-19, both in terms of risk of infection and mortality. (21) Socioeconomic factors have been extensively reported to explain such disparities, but the role of genetic factors remains understudied. (22-25) Further, participants from ethnic minorities are unjustifiably underrepresented in vaccine clinical trials. (26)
The rate of reported cases of COVID-19 on Indigenous reserves is 183% higher than in the general Canadian population. (27) Since geographically isolated Indigenous communities are not protected by the herd immunity of urban centers and may have a higher susceptibility to COVID-19, assessing vaccine effectiveness is critical.
Real-world, post-hoc vaccine effectiveness studies that have started to emerge mainly focus on the general population. While elderly or healthcare workers, for example, have been part of distinct subgroup analyses, ethnic groups have not. (28)
The underrepresentation of ethnic minorities in clinical trials may persist in vaccine effectiveness studies. Computational tools have proven extremely useful in SARS-CoV-2 vaccine development and may provide a solution by predicting vaccine effectiveness for special populations. (29)
A particularly relevant example is Liu et al.'s analysis of the Moderna, Pfizer-BioNTech, and AstraZeneca vaccines. They showed that African Americans and Asians could have a slightly increased risk of vaccine ineffectiveness in silico, but the findings remain to be confirmed clinically. (30)
Hence, evidence for the implication of HLA polymorphism and varying allele frequencies in COVID-19 heterogeneity is provided in the literature. This means different HLA haplotypes could also influence responsiveness to vaccines through presentation of different HLA II immunopeptidomes to CD4+ T cells. The purpose of this study is to predict and evaluate vaccine effectiveness in terms of the ability to trigger the CD4+ T cell help required in antibody generation, based on the HLA II haplotypes common in ethnically diverse North American populations.
Methods
Retrieving allele frequencies
The Allele Frequency Net Database (AFND) stores gene frequencies in the form of alleles, haplotypes, or genotypes from worldwide populations. (31) The "HLA classical allele frequency search" tool was used to retrieve the most common HLA II alleles in Canada and the United States (i.e., North American populations) for the HLA-DPA1, -DPB1, -DQA1, -DQB1, -DRB1 class II loci (Supplemental Material 1).
Grouping by principal component analysis (PCA) and computing weighted averages
To determine if the raw data for the individual populations in each ethnic group sample could be merged based on variance, PCAs were performed using R software 4.0.3 (R Core Team, Vienna, Austria) for each ethnic group separately. (32) From the resulting PCA biplots, pools or groups of populations were created based on the Euclidean distance in a plot of the first and the second component. The weighted averages of the allele frequencies based on sample sizes of the populations were computed for the various pools or groups. The weighted average frequencies of the most common alleles were summed to reach a minimum of 70% total coverage of the population, in line with the principle of herd immunity. In certain cases, multiple combinations were made, when alleles were particularly common. Finally, a merged list of alleles for all ethnic groups was created (Supplemental Material 2, 3).
Haplotype associations
HLA-DP and -DQ must be considered as haplotypes in the Immune Epitope Database (IEDB), while only the beta chain of HLA-DR alleles may be specified because the alpha chain is invariable. Thus, haplotypes were retrieved from AFND and the literature (Supplemental Material 1). (31, 33-39)
Retrieving COVID-19 vaccine contents and protein sequences
Four vaccine types (whole virus, spike, RBD and spike and nucleocapsid) were selected for analysis and their protein sequences were retrieved in FASTA format from the NCBI GenBank (Accession number: MN908947.3) (Supplemental Material 4). (40)
Using IEDB's MHC II binding prediction tool
Using the bioinformatics tool Split FASTA, (41) the various protein sequences were separated into peptides of 15 amino acids in length, with an overlap of 10 amino acids, to reduce redundancy in the results. (42) The protein sequences and the HLA haplotypes were inputted into IEDB's MHC II Binding Prediction Tool to predict peptide-HLA binding affinities and immunogenicity. (39, 42) The prediction method selected was "IEDB recommended 2.22" to ensure the best predictor or algorithm for each allele was used. To compare the binding affinities obtained from various algorithms, IEDB outputs a percentile rank for each peptide-HLA "hit." Lower percentile ranks suggest higher peptide-HLA affinity (Supplemental Material 4). (43)
Calculating weighted counts, total counts, and plotting count histograms
Weighted counts were computed by multiplying the number of peptide-HLA hits or score for each allele/haplotype by their frequency. Sums of the weighted counts for each ethnic group were then calculated, for each type of vaccine according to their protein contents, to arrive at total counts of the number of peptide-HLA hits for both top 1st and top 10th percentiles. The top 10th percentile data for all vaccine types except RBD (i.e., less relevant than other types) were distributed along percentile intervals. Count histograms were plotted from the sum of weighted counts along each interval (Supplemental Material 4).
This is a descriptive report that presented estimates of the HLA II and SARS-CoV-2 proteome affinities based on pre-existing data, without attempting to compare them. Thus, statistical analyses were not performed.
Results
Population definitions
HLA allele frequencies in population samples from all sources were subjected to PCA to determine how they could be pooled. It was not necessarily possible to pool together all samples from different sources named after the same ethnicity. For example, since the Indigenous Canada population samples appeared as two distinct clusters in the PCA biplot, they were separated into two groups, Indigenous Canada #1 and #2 (Table 1).
Table 1: Total counts of peptide-Human Leukocyte Antigen (HLA) hits for all vaccine types
Whole virus
Spike
RBD
Spike and nucleocapsid
Percentile
Top 10th
Top 1st
Top 10th
Top 1st
Top 10th
Top 1st
Top 10th
Top 1st
Indigenous Canada #1
113
6.10‡
15.6
0.980‡
1.95
0.00‡
19.2
0.980‡
Indigenous Canada #2
142
14.3
19.1
1.95
3.80
0.697
21.7
2.09
Indigenous US #1
95.2
5.29‡
10.6
0.315‡
1.78
0.00‡
13.3
0.426‡
Indigenous US #2
177
18.7
22.8
2.97
5.24
0.779
25.5
2.97
Indigenous US + Canada #1
177
18.6
22.7
2.97
5.23
0.778
25.5
2.97
Indigenous US + Canada #2
95.2
5.29‡
10.6
0.315‡
1.78
0.00‡
13.3
0.426‡
African American #1*
63.9*
5.55*
8.06*
0.635*
1.59*
0.173*
9.59*
0.668*
African American #2
137
10.7
14.7
0.579‡
2.34
0.544
17.1
0.886‡
Caucasian†
167†
19.4†
18.6†
2.23†
4.59†
0.813†
20.7†
2.23†
Polynesian
131
9.85
16.7
1.89
2.87
0.987
19.9
1.89
Asian (General)
111
9.14
16.2
1.28
1.80
0.102‡
19.5
1.46
South Asian
115
11.0
17.4
1.36
3.93
0.290
20.3
1.59
East Asian
109
9.42
15.9
1.34
2.88
0.417
18.2
1.66
Southeast Asian #1
125
13.9
18.7
1.91
4.66
0.655
21.0
2.16
Southeast Asian #2
130
11.8
16.4
1.05
2.20
0.766
18.8
1.05
Hispanic #1
108
8.48‡
15.7
1.01
3.01
0.171
18.3
1.25
Hispanic #2
96.4
5.32‡
10.7
0.302‡
1.72
0.00‡
13.5
0.411‡
Mestizo #1
198
25.9
21.9
3.24
6.69
1.30
22.5
3.24
Mestizo #2
113
8.69
16.2
1.07
3.10
0.149
18.9
1.27
Mixed
167
17.0
19.2
1.92
3.98
0.654
21.8
2.14
Arab
108
9.30
14.0
1.27
3.10
0.264
17.1
1.71
RBD = receptor-binding domain
* Underestimate
† Reference group
‡ Discrepancy relative to Caucasian reference group (for top 1st percentile only)
Certain minority ethnic groups were predicted to have fewer peptide-HLA hits within the top 1st percentile
Results for the top 10th and top 1st percentile peptide-HLA hits were tabulated (Table 1). The reference group used to highlight major discrepancies were Caucasian, as it is the most studied ethnicity. As expected, it was one of the top scoring groups across all four vaccine types, showing high total counts of predicted peptide-HLA hits. Overall, the biggest discrepancies between ethnicities occurred within the top 1st percentile (i.e., the strongest peptide-HLA bindings). In fact, the bottom five scores for each type of vaccine were lower than the Caucasian score by at least a factor of 2, and up to a factor of 8 in the RBD vaccine.
An underestimate of the true total counts arose for African American #1 group because the coverage of the population only amounted to 40% instead of the set minimum of 70%, due to the unavailability of some data. Since African American #1 has a sample size of around 4,900 compared with approximately 480,000 for African American #2, the latter group was deemed more representative.
Increased counts of peptide-HLA hits among the lower percentiles may suggest better antigen presentation
Although an ethnic group may have scored low within the top 1st percentile, it may have additional peptide-HLA hits with sufficient binding affinities for successful antigen presentation, for example, between the 1st and 5th percentiles. Thus, the distributions of peptide-HLA hits across the top 10th percentile were plotted in count histograms (Figures 1-3) for whole virus, spike, and spike and nucleocapsid vaccines, for the groups with the discrepancies identified in Table 1 (refer to Supplemental Material 4 for the rest of the data and RBD vaccine).