8
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found

      Galactosylation of IgA1 Is Associated with Common Variation in C1GALT1

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          IgA nephropathy (IgAN), an important cause of kidney failure, is characterized by glomerular IgA deposition and is associated with changes in O-glycosylation of the IgA1 molecule. Here, we sought to identify genetic factors contributing to levels of galactose-deficient IgA1 (Gd-IgA1) in white and Chinese populations. Gd-IgA1 levels were elevated in IgAN patients compared with ethnically matched healthy subjects and correlated with evidence of disease progression. White patients with IgAN exhibited significantly higher Gd-IgA1 levels than did Chinese patients. Among individuals without IgAN, Gd-IgA1 levels did not correlate with kidney function. Gd-IgA1 level heritability (h2), estimated by comparing midparental and offspring Gd-IgA1 levels, was 0.39. Genome-wide association analysis by linear regression identified alleles at a single locus spanning the C1GALT1 gene that strongly associated with Gd-IgA1 level (β=0.26; P=2.35×10-9). This association was replicated in a genome-wide association study of separate cohorts comprising 308 patients with membranous GN from the UK (P<1.00×10-6) and 622 controls with normal kidney function from the UK (P<1.00×10-10), and in a candidate gene study of 704 Chinese patients with IgAN (P<1.00×10-5). The same extended haplotype associated with elevated Gd-IgA1 levels in all cohorts studied. C1GALT1 encodes a galactosyltransferase enzyme that is important in O-galactosylation of glycoproteins. These findings demonstrate that common variation at C1GALT1 influences Gd-IgA1 level in the population, which independently associates with risk of progressive IgAN, and that the pathogenic importance of changes in IgA1 O-glycosylation may vary between white and Chinese patients with IgAN.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors

          Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line–specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook ( http://factorbook.org ) and will continually update this repository as more ENCODE data are generated.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found

            Discovery of new risk loci for IgA nephropathy implicates genes involved in immunity against intestinal pathogens

            We performed a genome-wide association study (GWAS) of IgA nephropathy (IgAN), the most common form of glomerulonephritis, with discovery and follow-up in 20,612 individuals of European and East Asian ancestry. We identified six novel genome-wide significant associations, four in ITGAM-ITGAX, VAV3 and CARD9 and two new independent signals at HLA-DQB1 and DEFA. We replicated the nine previously reported signals, including known SNPs in the HLA-DQB1 and DEFA loci. The cumulative burden of risk alleles is strongly associated with age at disease onset. Most loci are either directly associated with risk of inflammatory bowel disease (IBD) or maintenance of the intestinal epithelial barrier and response to mucosal pathogens. The geo-spatial distribution of risk alleles is highly suggestive of multi-locus adaptation and the genetic risk correlates strongly with variation in local pathogens, particularly helminth diversity, suggesting a possible role for host-intestinal pathogen interactions in shaping the genetic landscape of IgAN.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Geographic Differences in Genetic Susceptibility to IgA Nephropathy: GWAS Replication Study and Geospatial Risk Analysis

              Introduction IgA nephropathy (IgAN) is a common kidney disease with a complex genetic determination. This disorder is diagnosed based on detection of mesangial proliferation and glomerular deposits of IgA1. Most frequently, IgAN has a progressing course and 20–50% of cases develop end-stage renal disease (ESRD) within 20 years of follow-up [1]. The disease has been detected among all ethnicities worldwide, but displays a striking geographic variation. It is the most common cause of kidney failure in East Asian countries, has intermediate prevalence in European and US populations but is rarely reported in populations of African ancestry. The diagnosis of IgAN requires a kidney biopsy, complicating accurate determination of heritability and population prevalence of disease. Autopsy and donor biopsy series suggest a prevalence of up to 1.3% in Finland [2] and 3.7% in Japan [3]. Familial aggregation of IgAN has also been recognized throughout the world [4], [5], [6], [7], [8], [9], [10], [11] and up to 14% of cases may be familial [8]. Moreover, family members frequently have aberrant glycosylation of the hinge region of circulating IgA1, a defect with an estimated heritability of 40–50% [12], [13]. These data suggest a strong genetic contribution to disease. Recently, we have completed a large-scale genome-wide association study (GWAS) involving a cohort of 3,144 sporadic IgAN cases [14]. The discovery phase samples (1,194 cases and 902 controls) were recruited in Beijing, China and were comprised of individuals of Han Chinese ancestry. The most associated SNPs were then followed up in additional cohorts of Han Chinese and Europeans (1,950 cases and 1,920 controls). In the combined analysis, we discovered 5 novel susceptibility loci with consistent effects across individual cohorts. These include 3 distinct intervals in the MHC-II region on chromosome 6p21, with the strongest signal encompassing the HLA DQB1/DQA1/DRB1 locus (abbreviated as DQB1/DRB1 hereafter). Imputation of classical alleles suggested that this signal was partially conveyed by a strong protective effect of the DRB1*1501-DQB1*0602 haplotype. The second signal on Chr. 6p21 encompassed a ∼100 Kb region containing TAP2, TAP1, PSMB8, and PSMB9 genes (TAP2/PSMB9 locus) and the third signal on Chr. 6p21 contained the HLA DPA1/DPB1/DPB2 genes (DPA1/DPB2 locus). Independence of these three regions on Chr. 6p21 was demonstrated by their localization within distinct LD blocks as well as genome-wide significant associations after rigorous conditional analyses. We also detected significant association within the Complement factor H (CFH) gene cluster on Chr. 1q32, where alleles tagging a common deletion in the CFHR3 and CFHR1 genes imparted a significant protective effect (CFHR3/R1 locus). Finally, a fifth signal centered on the HORMAD2 gene on Chr. 22q12 and containing multiple genes demonstrated significant association with risk of IgAN (HORMAD2 locus). These five loci individually conferred a moderate risk of disease (OR 1.25–1.59), but together explained 4–5% of the variation in risk across the populations examined. To follow-up these studies and better assess the risk imparted by susceptibility alleles in diverse populations, we performed a replication study in eight independent case-control cohorts and performed a meta-analysis of all available genetic data including the original GWAS, totaling in 10,755 individuals. The expanded sample size allowed us to formally assess locus heterogeneity, identify new independent risk variants by conditional analyses and search for first-order genetic interactions. Finally, we refined a genetic risk score for IgAN and analyzed differences in the distributions of the IgAN susceptibility alleles among the major world populations. Results Replication Study For replication we examined eight cohorts (five European, two East Asian, and one African-American cohort, totaling 2,228 cases and 2,561 controls, described in Table S1). While each individual cohort at best had 40–50% power to replicate original GWAS findings, the combined replication cohort (2,228 cases and 2,561 controls) provided essentially 100% power for replication across the range of allele frequencies and odds ratios initially observed (Table S2). We genotyped the two top-scoring SNPs for the CFHR3/R1, TAP2/PSMB9, DPA1/DPB2, and HORMAD2 loci, but four SNPs were included for the DQB1/DRB1 locus to test for independent alleles at this interval by conditional analysis. After a standard assessment of genotype quality control, we performed association testing within each cohort using the standard Cochrane-Armitage trend test (Table S3). We also tested for heterogeneity of associations and performed a meta-analysis under both fixed and random effects models (Table 1). 10.1371/journal.pgen.1002765.t001 Table 1 Replication Study Results and Combined Meta-Analysis. Replication StudyN = 4,789 across 8 cohorts(2,228 cases/2,561 controls) Replication and GWASN = 10,755 across 12 cohorts(5,372 cases/5,383 controls) Fixed Effects Random Effects# Fixed Effects Random Effects# Chr Location (kb) SNP (minor allele) OR P-value OR P-value I2 Q-test OR P-value OR P-value I2 Q-test Annotation 1 194,918 rs3766404 (C) 0.78 2.5×10−4 0.78 4.2×10−4 0% 0.84 (NS) 0.78 7.9×10−8 0.78 1.3×10−7 6% 0.39 (NS) CFHR3/R1 locus 1 194,953 rs6677604 (A) 0.78 3.1×10−5 0.78 5.5×10−5 0% 0.48 (NS) 0.74 2.1×10−13 0.74 4.6×10−13 21% 0.23 (NS) 6 32,768 rs9275224 (A) 0.75 3.6×10−11 0.75 7.1×10−11 0% 0.67 (NS) 0.72 8.5×10−30 0.72 2.8×10−29 0% 0.69 (NS) DQB1/DRB1 locus 6 32,778 rs2856717 (T) 0.86 1.1×10−3 0.86 1.8×10−3 0% 0.71 (NS) 0.77 6.6×10−16 0.78 7.3×10−16 29% 0.16 (NS) 6 32,779 rs9275424 (G) 1.22 5.0×10−5 1.22 8.7×10−5 19% 0.27 (NS) 1.28 2.6×10−14 1.26 4.6×10−14 30% 0.14 (NS) 6 32,789 rs9275596 (C) 0.75 5.3×10−9 0.75 9.5×10−9 0% 0.60 (NS) 0.67 5.0×10−32 0.67 3.1×10−32 43% 0.05 (NS) 6 32,917 rs9357155 (A) 0.96 5.8×10−1 0.97 9.4×10−2 54% 0.025* 0.79 1.1×10−8 0.87 2.6×10−11 70% 1.0×10−4 ** TAP2/PSMB9 locus 6 32,919 rs2071543 (A) 0.91 1.7×10−1 0.92 1.2×10−1 43% 0.08 (NS) 0.78 5.7×10−10 0.84 4.0×10−11 61% 2.0×10−3 ** 6 33,194 rs1883414 (T) 0.87 3.1×10−3 0.87 5.0×10−3 0% 0.96 (NS) 0.82 3.0×10−10 0.82 5.9×10−10 0% 0.86 (NS) DPA1/DPB2 locus 6 33,205 rs3129269 (T) 0.89 1.1×10−2 0.89 1.7×10−2 0% 0.75 (NS) 0.83 2.5×10−9 0.83 4.6×10−9 0% 0.51 (NS) 22 28,824 rs2412971 (A) 0.81 1.1×10−6 0.81 2.1×10−6 24% 0.23 (NS) 0.80 4.0×10−15 0.80 9.5×10−15 12% 0.33 (NS) HORMAD2 locus 22 28,859 rs2412973 (A) 0.81 6.9×10−7 0.81 1.2×10−6 29% 0.19 (NS) 0.80 9.9×10−15 0.80 2.3×10−14 16% 0.29 (NS) Combined association results for 12 SNPs representing 5 independent regions that reached genome-wide significance in the original GWAS. The combined effect estimates (per allele odds ratios) in the replication cohorts were all direction-consistent with the ones in the original GWAS cohorts. Significant heterogeneity was noted only for the second HLA locus represented by rs9357155 and rs2071543. Q-test: P-value for the Cochrane's Q statistic for heterogeneity, NS: heterogeneity test not significant, *: heterogeneity P 75% to high level of heterogeneity; OR: Additive (per-allele) Odds Ratio; # Han and Eskin random effects model. Four of the five original GWAS loci displayed significant replication with direction-consistent ORs and no heterogeneity comparable to the original findings (Table 1). The strongest replication was at the DQB1/DRB1 locus and achieved genome-wide significance in the replication cohort (fixed effects OR 0.75, P-value 4×10−11). The CFHFR3/R1 locus on Chr.1q32, the HORMAD2 locus on Chr.22q12, and the DPA1/DPB2 locus on Chr.6p21 were also robustly replicated (fixed effects p-values 3×10−3–7×10−7), with minimal between-cohort heterogeneity (I2 1%) are tested for association. Nonetheless, these 3 independent haplotypes in DQB1/DRB1 locus still did not explain associations in other Chr. 6p21 regions (TAP2/PSMB9 and DPA1/DPB2 loci, respectively represented by rs9357155 and rs1883414), and a fully adjusted model that included all independently associated SNPs continued to support the original GWAS findings of three discrete genome-wide significant intervals on Chr. 6p21 (Table 4). 10.1371/journal.pgen.1002765.t004 Table 4 The best predictive model for IgAN based on all the genotyped SNPs and their pairwise interaction terms. Best Predictive Model Predictor (Reference Allele) Coeficient (β) OR (95%CI) P-value Chr. Annotation of Genes in the Region rs6677604 (A) −0.49371 0.61 (0.53–0.71) 2.2×10−11 1q32 CFH, CFHR1, CFHR3 rs9275224 (A) −0.31307 0.73 (0.67–0.80) 2.5×10−11 6p21 HLA-DQB1, -DQA1, -DRB1 (variant 1) rs2856717 (T) 0.42265 1.53 (1.31–1.78) 8.2×10−8 6p21 HLA-DQB1, -DQA1, -DRB1 (variant 2) rs9275596 (C) −0.51157 0.60 (0.52–0.69) 5.9×10−13 6p21 HLA-DQB1, -DQA1, -DRB1 (variant 3) rs9357155 (A) −0.28621 0.75 (0.69–0.82) 3.8×10−10 6p21 HLA-DOB, PSMB8, PSMB9, TAP1, TAP2 rs1883414 (T) −0.1805 0.83 (0.78–0.90) 4.8×10−7 6p21 HLA-DPB2, -DPB1, -DPA1 rs2412971 (A) −0.28592 0.75 (0.70–0.81) 2.3×10−15 22q12 HORMAD2, MTMR3, LIF, OSM, GATSL3, SF3A1 rs6677604 (A)* rs2412971 (A) 0.23171 1.26 (1.12–1.43) 2.2×10−4 – 1q32 by 22q12 interaction term This model represents the solution of a stepwise logistic regression algorithm (BIC-based stepwise model selection). The coefficients from this model are used to refine the risk score for IgAN. First-Order Interaction Screen Reveals Significant Interaction between CFHR3/R1 and HORMAD2 Loci We tested the possibility of interaction between the 7 risk-contributing SNPs and therefore tested for all possible pairwise interactions (Table S6). We detected strong evidence for a multiplicative interaction (defined as departure from additivity on the log-odds scale) between the CFHR3/R1 (rs6677604) and the HORMAD2 loci (rs2412971). In this interaction, the rs2412971-A allele has a strong and consistent protective effect among all genotypic subgroups, but its effects are reversed among homozygotes for the rs6677604-A allele, which closely tags a CFHR3/R1 deletion (Figure 1, Table S6). The significance of this interaction (p = 2.5×10−4) exceeds a Bonferroni-corrected threshold for 21 tests, and is most discernable among the European cohorts (p = 1.4×10−3), where both SNPs have higher minor allele frequencies. The 4-df genotypic interaction test was also significant for these two loci (p = 6.4×10−3), but the 1-df multiplicative interaction model provided a better fit. 10.1371/journal.pgen.1002765.g001 Figure 1 Multiplicative interaction between Chr. 22q12 (rs2412971) and Chr. 1q32 (rs6677604) loci. The allelic effects of rs2412971-A by genotype class of rs9275596 (top signal in the HLA, no interaction) and rs6677604 (top signal in at CFHR1/R3 locus on Chr. 1q32, significant interaction). The protective effect of rs2412971-A allele is reversed in homozygotes for the rs6677604-A allele, which tags a deletion in CFHR3/R1. The allelic effects are expressed on the log-odds scale and correspond to beta coefficients of the logistic regression model. Error bars correspond to 95% confidence intervals. Improved Prediction of Genetic Risk with a Refined Risk Score The original IgAN risk score model was based on the genotypes of the top scoring SNPs at the 5 independent loci discovered in the GWAS [14]. We refined this risk score by incorporating the newly discovered independent effects of rs9275224 and rs2856717 and the interaction between the CFHR3/R1 and the HORMAD2 loci. A stepwise regression algorithm in the entire cohort defined a new risk score that retained the 7 SNPs exhibiting an independent effect as well as the rs6677604* rs2412971 interaction term (Table 4). When compared with the original GWAS model, the newly refined score was more strongly associated with disease risk and explained a greater proportion of the disease variance in both the replication and the original GWAS dataset (Table 5). Moreover, the refined risk score was a highly significant predictor of disease in each individual replication cohort (Table S7). In all datasets combined, the new risk score explained 4.7% in disease variance and was 13 orders of magnitude more significant than the original score. In this model, one standard deviation increase in the score was associated with nearly 50% increase in the odds of disease (OR = 1.47, 95% CI: 1.42–1.54, P = 1.2×10−72). This translates into nearly a 5-fold increase in risk between individuals from the opposing extremes of the risk score distribution (with tails defined by ≥2 standard deviations from the mean). 10.1371/journal.pgen.1002765.t005 Table 5 The comparison of the original and the newly refined IgAN risk score. Original Risk Score Newly Refined Risk Score Cohort: N# R2 * C** OR*** P-value**** R2 * C** OR*** P-value**** Original GWAS Cohorts 5,631 5.0% 0.61 1.51 3.1×10−46 5.7% 0.62 1.56 4.1×10−52 Replication Cohorts 4,422 2.2% 0.58 1.29 5.4×10−17 3.2% 0.59 1.36 3.3×10−24 Asian Cohorts Combined 4,582 4.5% 0.60 1.53 3.0×10−34 5.0% 0.61 1.52 2.6×10−38 European Cohorts Combined 5,386 2.6% 0.58 1.34 3.7×10−24 3.6% 0.59 1.42 6.7×10−33 All Cohorts Combined 10,053 3.8% 0.60 1.42 6.2×10−63 4.7% 0.61 1.47 1.2×10−76 The expanded version of this table can be found in supplemental material (Table S7). # Number of analyzed individuals with 100% non-missing genotypes across all 7 scored loci. *: 2: Nagelkerke R square (expressed as percentage). **: C-statistic: area under the ROC curve. ***: Odds ratio per one standard deviation of the standardized risk score. ****: Wald's test for risk score as a quantitative predictor of disease status. Geospacial Modeling of Genetic Risk Reveals New Geographic Patterns in Disease Prevalence Similar to the GWAS study, we detected pronounced differences in the distributions of risk alleles among the three different ethnicities studied: for each of these seven risk loci, the frequency of the risk alleles was highest in East Asians and lowest in African-Americans (Figure S1). These differences were also reflected in highly significant disparities in the risk score distributions by ethnicity (Figure 2). Motivated by these observations, we examined global geographic variation in the genetic risk for IgAN by applying the newly refined IgAN risk score in 6,319 healthy individuals across 85 worldwide populations. We observed marked differences in the genetic risk across the world. Overall, the mean standardized risk score was lowest for Africans, intermediate for Middle Easterners and Europeans, and highest for East Asians and Native Americans (Figure 3 and Figure S2). Accordingly, the risk increased sharply with eastward distance from the prime meridian (Pearson's r = 0.27, p = 3.5×10−108). The same geospatial pattern were detected if we included only native populations of HGDP and HapMap-III (Figure S3), demonstrating that the findings are not biased by inclusion of control populations from the genetic association study. These data are consistent with the known East-West gradient in prevalence of IgAN, suggesting that genetic risk predicts prevalence. 10.1371/journal.pgen.1002765.g002 Figure 2 Differences in the distribution of the 7-SNP genetic risk score by ethnicity. Only healthy control participants of the replication studies that were fully genotyped at all 7 loci were used in this analysis. Similar to the GWAS study, the risk score distributions were significantly different by ethnicity (ANOVA p = 2.1×10−38). The corresponding differences in the distribution of risk alleles are depicted in Figure S1. 10.1371/journal.pgen.1002765.g003 Figure 3 Worldwide geospatial risk analysis. Surface interpolation of the standardized risk score over Africa and Euroasia (main), and Americas (inset). Symbols represent the locations of sampled populations: HGDP (circles), HapMap-III (diamonds), and healthy controls from this study (triangles). Unexpectedly, higher resolution analysis of the European continent revealed an additional increase in the risk from South to North (Pearson's r = 0.11, p = 1.3×10−9). For example, northwestern Russians and northern inhabitants of Orkney Islands (Scotland) have the highest risk scores when compared with the rest of the European continent (Tables S8 and S9). To confirm these finding and test whether North-South variation in genetic risk is also reflected in differences in IgAN occurrence, we obtained genetic data from additional European populations (Belgian, British, Finnish, Swedish and Icelandic) and compared genetic risk scores with the incidence and point prevalence of IgAN among end-stage renal disease (IgAN-ESRD) populations across Europe (Table S10). As predicted by the genetic risk score, our analysis confirmed a strong North-South cline of both incidence and prevalence across the European continent (Figure 4). Notably, this analysis includes only patients with end-stage IgAN, on dialysis or after kidney transplantation, thus it underestimates the true incidence and population prevalence of IgAN. Because the point prevalence of IgAN-ESRD (Figure 4b) can be confounded by differential survival on renal replacement therapy and differences in kidney biopsy practice by country, we also examined IgAN-ESRD prevalence expressed as a percentage of all ESRD (Figure 4c), and ESRD due to biopsy-diagnosed primary glomerulonephritis (Figure 4d). Regardless of the metric used to quantify differences in IgAN occurrence, regression of the genetic risk score and the prevalence data on the average latitude resulted in positive correlations and parallel trends. 10.1371/journal.pgen.1002765.g004 Figure 4 Correlation of average country latitude with country-specific genetic risk and IgAN–attributable ESRD across the European continent. The South to North latitude is indicated on the X-axis. The median genetic risk (x) is indicated on the right Y-axis. The following incidence and prevalence metrics (o) are indicated on the left Y-axis: (panel a) the incidence of ESRD due to IgAN per million population (correlation with latitude: r = 0.54, p = 0.05); (panel b) the prevalence of ESRD due to IgAN per million population (correlation with latitude: r = 0.47, p = 0.10); (panel c) the percent of IgAN patients among all ESRD cases (correlation with latitude: r = 0.67, p = 0.01); and (panel d) among ESRD cases due to primary glomerular disease (correlation with latitude: r = 0.71, p = 0.006). All p-values are derived based on a two-sided hypothesis test. The co-variation in genetic risk score and IgAN-ESRD occurrence among world populations may also be in part influenced by differences in environment, or by other factors such as local medical guidelines for screening and treatment. To better distinguish these possibilities, we examined native populations that live under a uniform environment yet show variation in IgAN risk. In the densely sampled North Italian populations, the Alpine villagers of the Valtrompia region have a 3.5-fold higher prevalence of ESRD attributable to IgAN and primary glomerulonephritis when compared to the national average [16]. Consistent with this prevalence data, the median standardized risk score in this population was comparable to some of the Northern European countries and ranked as number one among the 17 Italian populations sampled in our study (Figure 5, Table S8). 10.1371/journal.pgen.1002765.g005 Figure 5 High-resolution geospatial risk analysis for Italy. A well defined region of higher genetic risk was uncovered in Northern Italy that centers on Valtrompia, Brescia, and Cremona (median standardized risk scores 0.31, 0.24 and 0.24, respectively). The healthy individuals from Valtrompia had the highest risk scores when compared to 16 other Italian populations sampled. Conversely, we compared the genetic risk score and IgAN-ESRD prevalence in populations in the United States, where diverse ethnicities live under different environments and health care systems compared to the ancestral populations. The analysis of the USRDS dataset confirmed the striking ethnic differences in IgAN-ESRD prevalence (Table S11): the percentage of ESRD attributable to IgAN was 5-fold greater for Caucasian and 15-fold greater for Asian Americans compared to African-Americans. This increased IgAN-ESRD occurrence in Asian- compared to African-Americans far exceeds the 50% increase in risk predicted by genetic risk-score (one standard deviation difference), suggesting the presence of additional unaccounted genetic and environmental factors (Figure 6). 10.1371/journal.pgen.1002765.g006 Figure 6 Genetic risk and IgAN–attributable ESRD among major US ethnicities. The relationship between IgAN risk scores (red line) and IgAN incidence and prevalence (bars) among US ethnicities are shown. The following metrics of IgAN occurrence are depicted: (panel a) the incidence of ESRD due to IgAN per million population by ethnicity, (panel b) the prevalence of ESRD due to IgAN per million population by ethnicity, (panel c) percent of IgAN among the total ESRD population by ethnicity; and (panel d) percent of IgAN among ESRD due to glomerular disease by ethnicity. Discussion In this study, we examined the largest IgAN case-control cohorts reported to date. We first verified the five top signals identified in a recent GWAS for IgAN in independent cohorts and demonstrated robust replication of four loci, and heterogeneity at one locus. Using combined dataset of 10,755 individuals, we also identified novel risk alleles for IgAN in the DQB1/DRB1 locus and detected a significant interaction between the CFHR3/R1 and the HORMAD2 loci. We also defined a more powerful genetic risk score that explained 4.7% in disease variance across all cohorts. Finally, in examination of 85 world populations, the genetic risk score paralleled the prevalence of IgAN, confirming the known East-West cline but also led to the detection of an association of IgAN-ESRD prevalence with latitude in Europe. While ten of twelve tested SNPs (four susceptibility loci) were robustly replicated with direction-consistent ORs across all cohorts, the TAP2/PSMB9 locus demonstrated moderately high level of heterogeneity. This locus remained genome-wide significant in the combined analyses under both fixed and random effects model. Family-based studies [17], [18], sperm typing experiments [19] and HapMap data have identified a recombination hotspot directly centered over the TAP2 gene (22 cM/Mb, 5.5-kb centromeric from the 2 SNPs selected for replication). We can therefore hypothesize that high heterogeneity at this locus is due to the unusually high rates of recombination in this region, which perturbs LD patterns between tag-SNPs and causal variants; this situation has been shown to cause a “flip-flop” phenomenon in association results [20]. Therefore, higher density of SNP coverage on either side of the recombination hotspot will be needed to guide future replication and fine mapping efforts. In addition to the independent replication of GWAS data, we identified two new signals in the DQB1/DRB1 region that exhibit independent genome-wide significant effect in conditional analyses, providing support for multiple causal variants at this locus. These findings are consistent with previous studies of IgAN [15], [21] and other autoimmune diseases [22], [23], [24], [25], highlighting the complexity of associations in the MHC region. In our study, the strongest association signal originates in a protective haplotype tagged by rs9275596-C that carries HLA-DRB1*1501 and DQB1*602, also associated with protection against type I diabetes [24]. The causal variants underlying the other haplotypes remain obscure and their discovery will likely require comprehensive re-sequencing to define classical alleles. Genetic interactions have been seldom described in association studies [26]. We detected a multiplicative interaction between the CFHR3/R1 and the HORMAD2 loci, which was most evident in the European cohorts, likely because the frequencies of both protective variants are considerably higher in this population. While this interaction was robust to multiple-testing correction for 7 SNPs, it will require confirmation in additional independent cohorts or via functional studies that examine whether these two loci are involved in a common biological pathway. Because the rs6677604-A allele tags a deletion in the CFHR3/CFHR1 genes, this finding suggests that the absence of these proteins abrogates the benefit imparted by HORMAD2 protective alleles. It is thus noteworthy that the HORMAD2 locus encodes several cytokines (LIF, OSM) that can interact with complement factors [27]. A seven-SNP genetic risk score explained nearly 5% of IgAN variance and demonstrated co-variation with IgAN prevalence across multiple settings. The major limitations of geospatial modeling include variable sampling density and inadequate coverage of certain geographic regions. Using the most comprehensive resources presently available for geo-genetic analyses, we found that the genetic risk score strongly paralleled the well-known East-West gradient in IgAN prevalence [3], [28], [29], [30], [31], [32]. For each of these seven risk loci, the frequency of the risk alleles was highest in East Asians, lowest in African-Americans and intermediate in European populations. Accordingly, we detected co-variation of genetic risk with IgAN-ESRD incidence and prevalence among Asian-, White- and African-Americans, which share genetic background but not environment with their ancestral populations. Representative genetic data for U.S. Native Americans was not available from HGDP nor HapMap projects, precluding a direct comparison of their risk score with prevalence. However, the USRDS data and other reports indicate a high prevalence of IgAN-ESRD in US Native Americans [33], [34], [35], [36], [37], consistent with their ancestral origin from an Asian subpopulation that migrated across the Bering land bridge over 15,000 years ago [38]. In the more homogeneous population of Northern Italy, the median risk score in the Valtrompia valley was the highest among Northern Italian populations and comparable with the Northern European scores, consistent with Valtrompia's 3.5-fold higher prevalence of ESRD, which is largely attributable to IgAN [16]. Taken together, these data strongly suggested that variation in genetic risk partly explains the variation in geo-epidemiology of disease. Because the genetic score captured general trends in IgAN epidemiology, we also tested whether the Northward gradient in genetic risk in Europe is mirrored by higher prevalence of kidney failure from IgAN. The ERA-EDTA data, which are the most unbiased source of information available, demonstrate that Nordic countries have over 2-fold higher incidence and prevalence of IgAN-ESRD compared to the Southern European countries. Although higher risk of IgAN in Northern Europe has not been previously appreciated, similar latitudinal risk gradients in prevalence and incidence have been well established for several other immune-mediated diseases, including type 1 diabetes [39], [40], multiple sclerosis [41], [42], and inflammatory bowel disease [43]. Interestingly, these disorders share risk alleles with IgAN, suggesting that variation in common genetic risk factors may mediate variation in prevalence of autoimmune disorders. Since our analysis was limited to prevalent IgAN-ESRD in countries with epidemiological data available and only a portion of IgAN cases progresses to ESRD, studies that can better estimate the population prevalence of all IgAN can confirm these findings and better delineate epidemiological connections to other immune mediated disorders. The genetic and environmental factors leading to the observed geospatial pattern of genetic risk and disease prevalence are not clear. The pre-modern history of IgAN is not known because this disease was only first described in 1968 [44], shortly after the discovery and application of immunofluorescence in the analysis of kidney tissue. It is well known that mucosal infections can exacerbate disease, but specific environmental factors influencing the development of IgAN are not known. Based on a recently proposed pathogenesis model, the IgAN risk loci participate in sequential processes leading to the initiation and exacerbation of IgAN [45]. This may further explain the correlation of the genetic risk score with disease epidemiology. Interestingly, many of the IgAN loci are known to exhibit opposing effects on other autoimmune conditions [14]; for example, the HLA-DQB1 and HORMAD2 risk alleles are respectively protective for systemic lupus erythematosus, and inflammatory bowel disease. Thus balancing selection, in conjunction with local environmental factors, may be responsible for maintenance of risk alleles in different populations. The current IgAN risk score captures a greater proportion of the disease variance compared to other GWAS for kidney functions, such as a recent study of 60,000 individual that reported 13 loci explaining only 1.4% of the variance for estimated glomerular filtration rate [46]. Nonetheless, the fraction of the IgAN variation explained remains modest. For example, the one standard deviation risk-score difference between Asian- and African-Americans predicts a 50% increase in risk, yet there is over 10-fold difference IgAN-ESRD occurrence between these two groups. These data suggest that additional genetic and environmental factors influence risk. Based on the effect sizes and allelic frequencies of the discovered SNPs, we estimate that doubling the GWAS sample size is likely to find up to 7 additional loci, while tripling the sample size would identify up to 11 additional loci at genome-wide significant p-values 1×10−2). Individuals with more than 2 missing genotypes out of the 12 loci were also excluded from the analysis. The participants of the smaller GN-Progress study (207 cases and 159 controls) were genotyped using the Illumina HumanCNV370-duo chip at the Centre National de Génotypage (CEA, Evry, France). The analysis of intensity clusters and genotype calls were performed using the Illumina Genome Studio software. Of 366 genotyped individuals, two cases and 1.8% of SNPs were excluded based on low call rates ( 0.05, average I2 = 0). Therefore, these two cohorts were combined into a single cohort of 493 cases and 402 controls. Similarly to the French cohorts, there was no significant heterogeneity at any of the loci for the two smaller German cohorts (STOP-IgAN and Hamburg-Eppendorf), and these were also combined into a single cohort of 249 cases and 372 controls. Analysis of the Northern and Southern Italian cohorts suggested some heterogeneity at 3 out of 12 SNPs (I2 = 40–50%). Although these observations were not statistically significant (Q-test P>0.05), we used a conservative stratified approach for all downstream analyses for these two cohorts. The final summary of all study cohorts before and after quality control is provided in Table S1. Power Calculation We performed a power calculation for the final replication cohort size of 4,789 individuals (2,228 cases/2,561 controls) as a function of disease allele frequency and genotype relative risk (Table S2). The power was calculated in reference to a protective allele, with the range of allelic frequencies and effects comparable to the ones observed in the original GWAS. Assumptions included disease prevalence of 1%, log-additive model, no heterogeneity, and alpha = 0.01 (Bonferroni-adjusted considering five independent loci tested). This analysis confirmed that our study had ample power (nearly 100% for most loci) to replicate the associations observed in the initial GWAS. The power calculations were performed using QUANTO v.1.2 software [48]. Association Analyses The primary association analyses were performed using PLINK version 1.07 [49]. Similar to GWAS, we selected a standard 1-df Cochran-Armitage trend test as the primary association test. We also estimated the per-allele odds ratios and 95% confidence intervals for all tested SNPs within each individual cohort. The results across multiple cohorts were combined using an inverse variance-weighted method under a fixed-effects model (PLINK), as well as using a random effects model as proposed by Han and Eskin (METASOFT) [50]. We also tested for heterogeneity across cohorts by performing a formal Cochrane's Q heterogeneity test as well as by estimating the heterogeneity index (I2) [51]. Conditional Analyses The conditional association tests of the HLA loci were performed after controlling for the genotypes of the conditioning SNPs within each cohort using logistic regression (PLINK). The adjusted (conditioned) effect estimates were then combined across cohorts using a fixed effect meta-analysis considering no significant heterogeneity across these loci. For the purpose of validation of this approach, we also combined the results by adding cohort information as an additional covariate in the stratified analysis within the logistic regression framework. As expected, the results of both approaches were similar. Haplotype-Based Association Tests These analyses were carried out in PLINK v1.07 [49]. Haplotypes were first phased using EM algorithm across the HLA-DQB1, HLA-DQA1, HLA-DRB1 region. The haplotype frequencies were estimated in the cases and controls separately, as well as jointly in the entire cohort. Only common haplotypes with overall frequency >1% were included in the association tests. Global haplotype association test was performed using a χ 2 test with n-1 degrees of freedom for n common haplotype groups. The ORs and the corresponding 95% confidence intervals were estimated in reference to the most common haplotype (GCAT, frequency ∼35%). First-Order Interaction Analyses To explore the possibility of interactions between the 7 independent risk variants, we screened all possible pairwise interaction terms for association with disease within the framework of logistic regression models (R version 2.10). As a screening test, we used 1-df LRT to compare two nested models: one with main effects only and one with main effects and a multiplicative (logit-additive) interaction term. We included cohort membership as a fixed covariate in both of these models. For this analysis we selected a Bonferroni-adjusted significance of 2.4×10−3, a conservative threshold that accounts for all 21 pairwise interaction terms tested. Significant interactions from this analysis were also tested using a 4-df genotypic interaction test. In this test, we compared a model with allelic effects, dominant effects, and their interaction terms with a reduced model with no interaction terms. We followed the coding proposed by Cordell and Clayton: for each SNP i we modeled its allelic effect x ia by coding the genotypes AA, AB, and BB as x ia  = −1, 0, 1; we modeled dominance effects as x id  = −0.5, 0.5, −0.5 for the genotypes AA, AB, and BB, respectively [52]. Distributions of Protective Alleles and Risk Score Analyses Each study participant was scored for the number of risk alleles and the distributions of protective alleles were compared between cohorts of different ethnicity. Only individuals with complete genotype information at the 7 scored loci (14 alleles) were included in this analysis. The distributions were analyzed separately for cases and controls. A χ 2 goodness-of-fit test was used to derive p-values for comparison of distributions. Because of a relatively small number of individuals at the tails of the distributions, for the purpose of statistical testing the tails of the distributions were binned into single-bin categories to achieve expected cell counts >5. To confirm the results of conditional analyses and refine the genetic risk score proposed in the original GWAS, we subjected the genotype data from the entire cohort to a stepwise regression algorithm that selects significant covariates for the best predictive regression model based on Bayesian Information Criterion (the step function, R version 2.10). At model entry, we included all 12 genotyped SNPs, all 21 tested interactions, as well as cohort membership as a fixed covariate. Consistent with the results of our conditional analysis, the stepwise algorithm retained only the 7 SNPs exhibiting an independent effect along with the rs6677604*rs2412971 interaction term. All other terms were automatically dropped from the regression model. The risk score was calculated as a weighted sum of the number of protective alleles at each locus multiplied by the log of the OR for each of the individual loci from the final fully adjusted model. Only individuals with non-missing genotypes for all 14 alleles were included in this analysis. The risk score was standardized across all populations using a z-score transformation, thus the standardized score represented the distance between the raw score and the population mean in units of standard deviation. The percentage of the total variance in disease state explained by the risk score was estimated by Nagelkerke's pseudo R 2 from the logistic regression model with the risk score as a quantitative predictor and disease state as an outcome. The C-statistic was estimated as an area under the receiver operating characteristic curve provided by the above logistic model. These analyses were carried out with SPSS Statistics version 19.0. Geospatial Analyses For this purpose, we used publicly available genotype data of the Human Genome Diversity Panel (HGDP; 1,050 individuals representative of 52 worldwide populations), HapMap III (1,184 individuals representative of 11 populations), along with healthy controls genotyped as part of this study (4,547 individuals representative of 25 recruitment sites). The HGDP individuals have been previously genotyped for 660,918 markers using Illumina 650Y arrays (Stanford University). First, SNPs with genotyping rate 1%) are tested for association. (PDF) Click here for additional data file. Table S6 All possible 1st order multiplicative interactions between the 7 SNPs with independent effects on disease risk. Statistical significance is assessed using a Bonferroni-corrected threshold, alpha 0.05/21 = 2.4×10−3. (PDF) Click here for additional data file. Table S7 The comparison of the original and the newly refined genetic risk score. (PDF) Click here for additional data file. Table S8 African, Middle Eastern, and European populations included in the geospatial risk analysis. The populations were grouped by their continental origin and sorted based on the median genetic risk score. (PDF) Click here for additional data file. Table S9 Asian, Oceanian, and American populations included in the geospatial risk analysis. The populations were grouped by their continental origin and sorted based on the median genetic risk score. (PDF) Click here for additional data file. Table S10 Prevalence and Incidence of ESRD due to IgAN in Europe. Primary data obtained from the ERA-EDTA Registry. (PDF) Click here for additional data file. Table S11 Prevalence and Incidence of ESRD due to IgAN in the US. Primary data obtained from the USRDS Annual Report, 2011. (PDF) Click here for additional data file.
                Bookmark

                Author and article information

                Journal
                Journal of the American Society of Nephrology
                JASN
                American Society of Nephrology (ASN)
                1046-6673
                1533-3450
                June 30 2017
                July 2017
                July 2017
                February 16 2017
                : 28
                : 7
                : 2158-2166
                Article
                10.1681/ASN.2016091043
                5491291
                28209808
                39df2c9b-33ee-4e13-af83-c9ac467b1d5e
                © 2017
                History

                Comments

                Comment on this article