Introduction Glycosylation is a ubiquitous post-translational protein modification that modulates the structure and function of polypeptide components of glycoproteins [1], [2]. N-glycan structures are essential for multicellular life [3]. Mutations in genes involved in modification of glycan antennae are common and can lead to severe or fatal diseases [4]. Variation in protein glycosylation also has physiological significance, with immunoglobulin G (IgG) being a well-documented example. Each heavy chain of IgG carries a single covalently attached bi-antennary N-glycan at the highly conserved asparagine 297 residue in each of the CH2 domains of the Fc region of the molecule. The attached oligosaccharides are structurally important for the stability of the antibody and its effector functions [5]. In addition, some 15–20% of normal IgG molecules have complex bi-antennary oligosaccharides in the variable regions of light or heavy chains [6], [7]. 36 different glycans (Figure 1) can be attached to the conserved Asn297 of the IgG heavy chain [8], [9], leading to hundreds of different IgG isomers that can be generated from this single glycosylation site. 10.1371/journal.pgen.1003225.g001 Figure 1 Structures of glycans separated by HILIC-UPLC analysis of the IgG glycome. Glycosylation of IgG has important regulatory functions. The absence of galactose residues in association with rheumatoid arthritis was reported nearly 30 years ago [10]. The addition of sialic acid dramatically changes the physiological role of IgGs, converting them from pro-inflammatory to anti-inflammatory agents [11], [12]. Addition of fucose to the glycan core interferes with the binding of IgG to FcγRIIIa and greatly diminishes its capacity for antibody dependent cell-mediated cytotoxicity (ADCC) [13], [14]. Structural analysis of the IgG-Fc/FcγRIIIa complex has demonstrated that specific glycans on FcγRIIIa are also essential for this effect of core-fucose [15] and that removal of core fucose from IgG glycans increases clinical efficacy of monoclonal antibodies, enhancing their therapeutic effect through ADCC mediated killing [16]–[18]. New high-throughput technologies, such as high/ultra performance liquid chromatography (HPLC/UPLC), MALDI-TOF mass spectrometry (MS) and capillary electrophoresis (CE), allow us to quantitate N-linked glycans from individual human plasma proteins. Recently, we performed the first population-based study to demonstrate physiological variation in IgG glycosylation in three European founder populations [19]. Using UPLC, we showed exceptionally high individual variability in glycosylation of a single protein - human IgG - and substantial heritability of the observed measurements [19]. In parallel, we quantitated IgG N-glycans in another European population (Leiden Longevity Study – LLS) by mass spectrometry. In this study, we combined those high-throughput glycomics measurements with high-throughput genomics to perform the first genome wide association (GWA) study of the human IgG N-glycome. Results Genome-wide association study and meta-analysis We separated a single protein (IgG) from human plasma and quantitated its N-linked glycans using two state-of-the-art technologies (UPLC and MALDI-TOF MS). Their comparative advantages in GWA studies were difficult to predict prior to the conducted analyses, so both were used - one in each available cohort. We performed 77 quantitative measurements of IgG N-glycosylation using ultra performance liquid chromatography (UPLC) in 2247 individuals from four European discovery populations (CROATIA-Vis, CROATIA-Korcula, ORCADES, NSPHS). In parallel, we measured IgG N-glycans using MALDI-TOF mass spectrometry (MS) in 1848 individuals from another European population (Leiden Longevity Study (LLS)). Descriptions of these population cohorts are found in Table S11. Aiming to identify genetic loci involved in IgG glycosylation, we performed a GWA study in both cohorts. Associations at 9 loci reached genome-wide significance (P = 0.6 with top associated SNP; nHits: number of SNPs with GW-significant association; nTraits: number of IgG glycosylation traits associated with the region at GW-significant level; * effect size is in z-score units after adjustment for sex, age and first 3 principal components. + Description of the traits provided in Table S1; $ the SNP effect in opposite direction to most significant trait. The most statistically significant association was observed in a region on chromosome 3 containing the gene ST6GAL1 (Table 1, Figure S1A). ST6GAL1 codes for the enzyme sialyltransferase 6 which adds sialic acid to various glycoproteins including IgG glycans (Figure 2), and is therefore a highly biologically plausible candidate. In this region of about 70 kilobases (kb) we identified 37genome-wide significant SNPs associated with 14 different IgG glycosylation traits, generally reflecting sialylation of different glycan structures (Table 1). The strongest association was observed for the percentage of monosialylation of fucosylated digalactosylated structures in total IgG glycans (IGP29, see Figure 1 and Table S1 for notation), for which a SNP rs11710456 explained 17%, 16%, 18% and 3% of the trait variation for CROATIA-Vis, CROATIA-Korcula, ORCADES and NSPHS respectively (meta-analysis p = 6.12×10−75). NSPHS had a very small sample size in this analysis (N = 179) and may not provide an accurate portrayal of the variance explained in this particular population (estimated as 3%). Although the allele frequency is similar between all populations, in the forest plot (Figure S1A) although NSPHS does overlap with the other populations, the 95% CI is much larger. It is also possible that there are population-specific genetic and/or environmental differences in NSPHS that are affecting the amount of variance explained by this SNP. After analysis conditioning on the top SNP (rs11710456) in this region, the SNP rs7652995 still reached genome-wide significance (p = 4.15×10−13). After adjusting for this additional SNP, the association peak was completely removed. This suggests that there are several genetic factors underlying this association. Conditional analysis of all other significant and suggestive regions resulted in the complete removal of the association peak. 10.1371/journal.pgen.1003225.g002 Figure 2 A summary of changes to IgG N-glycan structures that were associated with 16 loci identified through GWA study. We also identified 28 SNPs showing genome-wide significant associations with 11 IgG glycosylation traits (2.70×10−11 95%, and SNP exclusions criteria were Hardy-Weinberg equilibrium p value 0.85) was studied. All studied SNPs had imputation quality of 0.3 or greater. Genome-wide association analysis In the discovery populations, genome-wide association analysis was firstly performed for each population and then combined using an inverse-variance weighted meta-analysis for all traits. Each trait was adjusted for sex, age and the first 3 principal components obtained from the population-specific identity-by-state (IBS) derived distances matrix. The residuals were transformed to ensure their normal distribution using quantile normalisation. Sex-specific analyses were adjusted for age and principal components only. The residuals expressed as z-scores were used for association analysis. The “mmscore” function of ProbABEL [87] was used for the association test under an additive model. This score test for family based association takes into account relationship structure and allowed unbiased estimations of SNP allelic effect when relatedness is present between examinees. The relationship matrix used in this analysis was generated by the “ibs” function of GenABEL (using weight = “freq” option), which uses genomic data to estimate the realized pair-wise kinship coefficient. All lambda values for the population-specific analyses were below 1.05 (Table S4), showing that this method efficiently accounts for family structure. Inverse-variance weighted meta-analysis was performed using the MetABEL package (http://www.genabel.org) for R. SNPs with poor imputation quality (R2 1∶256) and (c) histological reports of skin biopsy examination consistent with SLE was performed. Lastly, SLE cases were also identified through the Lupus Society of Trinidad and Tobago (90% of those patients were also identified through one of the two main public hospitals). For each case, randomly chosen households in the same neighbourhood were sampled by the field team to obtain (where possible) two controls, matched with the case for sex and for 20-year age group. Cases and controls were interviewed at home or in the project office by using a custom made questionnaire. The case definition of SLE was based on American Rheumatism Association (ARA) criteria [91], applied to medical records (available for more than 90% of cases), and to the medical history given by the patient. Informed consent for blood sampling and the use of the sample for genetic studies including estimation of admixture was obtained from each participant. Initial case ascertainment identified 264 possible cases of SLE. Of these, 72 (27%) were excluded either on the basis of their names or because their medical history did not meet ARA criteria for the diagnosis of SLE. Of the remaining 192 individuals, 54 had incomplete addresses or were not resident in northern Trinidad, four were too ill to be interviewed, eight were aged less than 18 years and two refused to participate. For 80% (99/124) of cases, two matched controls were obtained: the response rate from those invited to participate as controls was 70%. The total sample consisted of 124 cases and 219 controls aged over 20 years who completed the questionnaire. Blood samples were obtained from 122 cases and 219 controls and DNA was successfully extracted from 93% (317/341) of these. IgG glycans were successfully measured in 303 individuals. Age at sampling was not available for 17 individuals and 2 individuals were lost due to the ID mismatch. To test predictive power of selected glycan trait, we fitted logistic regression models (including and excluding the glycan) and used predRisk function of PredictABEL package for R to evaluate the predictive ability. Supporting Information Figure S1 Forrest plots for associations of glycan traits measured by UPLC and genetic polymorphisms. (PPT) Click here for additional data file. Table S1 The description of 23 quantitative IgG glycosylation traits measured by UPLC and 54 derived traits. (XLS) Click here for additional data file. Table S2 Descriptive statistics of glycan traits in discovery cohorts. (XLS) Click here for additional data file. Table S3 Summary data for all single-nucleotide polymorphisms and traits with suggestive associations (p<1×10-5) with glycans measured by UPLC. (XLS) Click here for additional data file. Table S4 Population-specific and pooled genomic control (GC) factors for associations with UPLC glycan traits. (XLS) Click here for additional data file. Table S5 Description of five glycan traits measured by MS and their descriptive statistic in the replication cohort. (XLS) Click here for additional data file. Table S6 Summary data for all single-nucleotide polymorphisms with replicated in the LLS cohort. (XLS) Click here for additional data file. Table S7 IgG glycans in 5 heterozygous Ikzf1 knockout mice and 5 wild-type controls. (XLS) Click here for additional data file. Table S8 Data for all IgG N-glycans measured in 101 Afro-Caribbean cases with SLE and 183 controls (Extended Table 4 from the main manuscript). (XLS) Click here for additional data file. Table S9 Effects of corticosteroids on IgG glycans. (XLS) Click here for additional data file. Table S10 Principal component analysis of IgG glycosylation traits. (XLS) Click here for additional data file. Table S11 Description of the analysed populations. (XLS) Click here for additional data file.