Introduction Chronic kidney disease (CKD) affects nearly 10% of the global population [1], [2], and its prevalence continues to increase [3]. Reduced estimated glomerular filtration rate (eGFR), the primary measure used to define CKD (eGFR 65 years of age). 10.1371/journal.pgen.1002584.t001 Table 1 Novel loci associated with eGFRcrea. Locus description Discovery analysis Replication analysis Combined analysis† Analysis subgroup SNP ID Chr Position (bp)‡ Genes nearby‡ Ref./Non-Ref. alleles (RAF) Effect(SE)§ P value§ Effect(SE) 1-sided P value Q value Effect(SE) P value I2 Overall rs3925584 11 30,716,911 MPPED2 T/C(0.54) −0.0077(0.0013) 1.0×10−09 −0.0073(0.0013) 4.0×10−9 1.1×10−08 −0.0075(0.0009) 8.4×10−18 21% Overall rs6431731 2 15,780,453 DDX1 T/C(0.94) −0.0181(0.0033) 4.6×10−08 −0.0065(0.0034) 0.0277 0.0195 −0.0127(0.0023) 4.3×10−08 11% No Diabetes rs2453580 17 19,378,913 SLC47A1 T/C(0.59) 0.0076(0.0014) 4.6×10−08 0.0038(0.0014) 0.0037 0.0039 0.0059(0.0010) 2.1×10−09 21% Age≤65 yrs* rs12124078 1 15,742,486 DNAJC16 , CASP9, AGMAT A/G(0.70) 0.0096(0.0015) 9.8×10−10 0.0098(0.0017) 5.0×10−9 1.1×10−08 0.0097(0.0011) 1.5×10−17 20% Age≤65 yrs rs11078903 17 34,885,450 CDK12 , MED1, FBXL20 A/G(0.76) −0.0103(0.0017) 2.4×10−09 −0.0083(0.0023) 1.4×10−4 2.0×10−04 −0.0096(0.0013) 9.0×10−13 0% Direction Test (Overall)** rs2928148 15 39,188,842 INO80 , EXD1, CHAC1 A/G(0.52) 0.0064(0.0012) 1.2×10−07 0.0033(0.0015) 0.0145 0.0122 0.0051(0.0009) 4.0×10−08 0% SNPs are listed in the stratum where the smallest P value in the discovery analysis was observed. Sample size/number of studies in the discovery phase: 74,354/26 (overall, direction test), 66,931/24 (No Diabetes), 46,435/23 (age ≤65 years); replication phase: 56,246/19 (overall, direction test), 41,218/17 (No Diabetes), 28,631/16 (age ≤65 years); combined analysis: 130,600/45 (overall, direction test), 108,149/41 (No Diabetes), 75,066/39 (age ≤65 years). Chr.: chromosome; bp: base-pairs; Ref./Non-Ref. All.: reference/non-reference alleles; RAF: reference allele frequency; SE: standard error. ‡ Genes nearby were based on RefSeq genes (build 36). The gene closest to the SNP is listed first and is in boldface if the SNP is located within the gene. § Effects on log(eGFRcrea); post GWAS meta-analysis genomic control correction applied to P values and SEs. * While being uncovered in the younger samples, this locus showed consistent results in the non-diabetic group (combined-analysis P value 5.7×10−16) and in the overall population (P value 9.5×10−22) - see Tables S16 and S10 for additional details. ** The direction test was performed in the overall dataset; the genomic control corrected P value from the direction test for the SNP rs2928148 was 4.0×10−7. In the combined analysis, the largest effect size (0.0054 on log eGFR in ml/min/1.73 m2) and the smallest P value (3.7×10−8) were observed in the non-diabetic group. † All results were confirmed by random-effect meta-analysis. We further examined our findings in 8,110 African ancestry participants from the CARe consortium [12] (Table 2). Not surprisingly, given linkage disequilibrium (LD) differences between Europeans and African Americans, none of the 6 lead SNPs uncovered in CKDGen achieved significance in the African American samples. Next, we interrogated the 250 kb flanking regions from the lead SNP at each locus, and showed that 4 of the 6 regions (MPPED2, DDX1, SLC47A1, and CDK12) harbored SNPs that achieved statistical significance after correcting for multiple comparisons based on the genetic structure of each region (see Methods for details). Figure 1 presents the regional association plots for MPPED2, and Figure S7 presents the plots of the remaining loci in the African American sample. Imputation scores for the lead SNPs can be found in Table S12. We observed that rs12278026, upstream of MPPED2, was associated with eGFRcrea in African Americans (P value = 5×10−5, threshold for statistical significance: P value = 0.001). While rs12278026 is monomorphic in the CEU population in HapMap, rs3925584 and rs12278026 have a D′ of 1 (r2 = 0.005) in the YRI population, suggesting that these SNPs may have arisen from the same ancestral haplotype. 10.1371/journal.pgen.1002584.g001 Figure 1 Genetic association and LD distribution of the MPPED2 gene locus in European and African ancestry populations. Regional association plots in the CKDGen European ancestry discovery analysis (N = 74,354) (A) and in the CARe African ancestry discovery analysis (N = 8,110) (B). LD structure: comparison between the HapMap release II – CEU and YRI samples in the region included within +/−100 kb from the target SNP rs3925584 identified in the CKDGen GWAS. The green circle highlights a stream of high LD connecting the two blocks, indicating the presence of common haplotypes (C). 10.1371/journal.pgen.1002584.t002 Table 2 Interrogation of the six novel loci uncovered in the European ancestry (EA) individuals (CKDGen consortium) in individuals of African ancestry (AA) from the CARe consortium for the trait eGFRcrea. Results for the lead SNPs in the CARe AA individuals Best SNP in region in the CARe AA individuals SNP ID* Nearby genes§ Ref./Non-Ref. alleles (RAF) Effect(SE) P value SNP ID Position (build 36) LD (R2) with lead SNP RAF (Ref./Non-Ref. alleles) Effect(SE) P value S** Bonferroni P value threshold (0.05/S) rs3925584 MPPED2 T/C (0.88) −0.0005(0.0066) 0.9349 rs12278026 30,744,460 0.005 0.89 (A/G) 0.0342(0.0084) 4.6×10−5 46 0.0011 rs6431731 DDX1 T/C (0.99) −0.0181(0.0213) 0.3948 rs4669002 15,874,859 NA† 0.56 (T/C) −0.0196(0.0047) 2.6×10−5 78 6.4×10−4 rs12124078 SLC47A1 A/G (0.69) −0.0024(0.0045) 0.5956 rs1472554 15,987,920 0.004 0.50 (C/G) −0.0120(0.0041) 0.0035 44 0.0011 rs2453580 DNAJC16 , CASP9, AGMAT T/C (0.59) 0.0056(0.0049) 0.2524 rs1800869 19,505,226 0.011 0.93 (C/G) −0.0294(0.0082) 3.6×10−4 33 0.0015 rs11078903‡ CDK12 , MED1, FBXL20 A/G (NA‡) NA‡ NA‡ rs1874226 34,982,557 0.112 0.34 (T/C) 0.0157(0.0045) 4.2×10−4 15 0.0033 rs2928148 INO80 , EXD1, CHAC1 A/G (0.22) −0.0003(0.0053) 0.9497 rs8039934 39,284,719 0.105 0.50 (T/C) −0.0086(0.0042) 0.0412 22 0.0023 Ref./Non-Ref. All.: reference/non-reference alleles; RAF: reference allele frequency; SE: standard error. * Characteristics of the six lead SNPs in the EA individuals from the CKDGen consortium can be found in Table 1. § The gene closest to the SNP is listed first and is in boldface if the SNP is located within the gene. ** S = number of independent, typed SNPs interrogated. † No LD information available in the HapMap database between the target SNP and the best SNP in the DDX1 region. ‡ The SNP rs11078903 was not present in the CARe consortium database. We also performed eQTL analyses of our 6 newly identified loci using known databases and a newly created renal eSNP database (see Methods) and found that rs12124078 was associated with cis expression of the nearby CASP9 gene in myocytes, which encodes caspase-9, the third apoptotic activation factor involved in the activation of cell apoptosis, necrosis and inflammation (P value for the monocyte eSNP of interest = 3.7×10−13). In the kidney, caspase-9 may play an important role in the medulla response to hyperosmotic stress [13] and in cadmium-induced toxicity [14]. The other 5 SNPs were not associated with any investigated eQTL. Additional eQTL analyses of 81 kidney biopsies (Table S13) did not reveal further evidence of association with eQTLs (Table S14). Of the 6 novel loci identified, 2 (MPPED2 and DDX1) were in regions containing only a single gene, and 1 (CASP9) had its expression associated with the locus lead SNP. Thus, to determine the potential involvement of these three genes during zebrafish kidney development, we independently assessed the expression of 4 well-characterized renal markers following morpholino knockdown: pax2a (global kidney) [15], nephrin (podocyte) [16], slc20a1a (proximal tubule) [17], and slc12a3 (distal tubule) [17]. While we observed no abnormalities in ddx1 morphants (Figure S8), mpped2 and casp9 knockdown resulted in expanded pax2a expression in the glomerular region in 90% and 75% of morphant embryos, respectively, compared to 0% in controls (P value 200 were set to 200 ml/min/1.73 m2. CKD was defined as eGFRcrea 60 ml/min/1.73 m2. Covariate definitions In discovery and replication cohorts, diabetes was defined as fasting glucose ≥126 mg/dl, pharmacologic treatment for diabetes, or by self-report. Hypertension was defined as systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg or pharmacologic treatment for hypertension. Discovery analyses Genotyping was conducted as specified in Table S4. After applying quality-control filters to exclude low-quality SNPs or samples, each study imputed up to ∼2.5 million HapMap-II SNPs, based on the CEU reference samples. Imputed genotypes were coded as the estimated number of copies of a specified allele (allelic dosage). Additional, study-specific details can be found in Table S1. Primary association analysis A schematic view of our complete analysis workflow is presented in Figure S1. Using data from 26 population-based studies of individuals of European ancestry, we performed GWA analyses of the following phenotypes: 1) loge(eGFRcrea), loge(eGFRcys), CKD, and CKD45 overall and 2) loge(eGFRcrea) and CKD stratified by diabetes status, hypertension status, age group (≤/>65 years), and sex. GWAS of loge(eGFRcrea) and loge(eGFRcys) were based on linear regression. GWAS of CKD and CKD45 were performed in studies with at least 25 cases (i.e. all 26 studies for CKD and 11 studies for CKD45) and were based on logistic regression. Additive genetic effects were assumed and models were adjusted for age and, where applicable, for sex, study site and principal components. Imputation uncertainty was accounted for by including allelic dosages in the model. Where necessary, relatedness was modeled with appropriate methods (see Table S1 for study-specific details). Before including in the meta-analysis, all GWA data files underwent to a careful quality control, performed using the GWAtoolbox package in R (www.eurac.edu/GWAtoolbox.html) [29]. Meta-analyses of study-specific SNP-association results, assuming fixed effects and using inverse-variance weighting, i.e.: the pooled effect is estimated as , where is the effect of the SNP on the outcome in the i th study, K is the number of studies, and is the weight given to the i th study. The meta-analyses were performed using METAL [30], with genomic control correction applied across all imputed SNPs [31] if the inflation factor λ>1 at both the individual study level and after the meta-analysis. SNPs with minor allele frequency (MAF) 0.03 was considered the top SNP in the African ancestry sample. We defined statistical significance of the identified lead SNP in African ancestry individuals based on a region-specific Bonferroni correction. The number of independent SNPs was determined based on the variance inflation factor (VIF) with a recursive calculation within a sliding window of 50 SNPs and pairwise r2 of 0.2. These analyses were performed using PLINK. Analyses of related phenotypes For each replicating SNP, we obtained association results for urinary albumin-to-creatinine ratio and microalbuminuria from our previous genome-wide association analysis [20], and for blood pressure and myocardial infarction from genome-wide association analysis from the ICBP [21] and CARDIoGRAM [22] consortia, respectively. eSNP analysis Significant renal SNPs were searched against a database of expression SNPs (eSNP) including the following tissues: fresh lymphocytes [36], fresh leukocytes [37], leukocyte samples in individuals with Celiac disease [38], lymphoblastoid cell lines (LCL) derived from asthmatic children [39], HapMap LCL from 3 populations [40], a separate study on HapMap CEU LCL [41], peripheral blood monocytes [42], [43], adipose [44], [45] and blood samples [44], 2 studies on brain cortex [42], [46], 3 large studies of brain regions including prefrontal cortex, visual cortex and cerebellum (Emilsson, personal communication), liver [45], [47], osteoblasts [48], skin [49] and additional fibroblast, T cell and LCL samples [50]. The collected eSNP results met criteria for statistical significance for association with gene transcript levels as described in the original papers. A second expression analysis of 81 biopsies from normal kidney cortex samples was performed as described previously [51], [52]. Genotyping was performed using Affymetrix 6.0 Genome-wide chip and called with GTC Software (Affymetrix). For eQTL analyses, expression probes (Affymetrix U133set) were linked to SNP probes with >90% call-rate using RefSeq annotation (Affymetrix build a30). P values for eQTLs were calculated using linear multivariable regression in both cohorts and then combined using Fisher's combined probability test (see also [52]). Pairwise LD was calculated using SNAP [53] on the CEU HapMap release 22. Zebrafish functional experiments Zebrafish were maintained according to established IACUC protocols. Briefly, we injected zebrafish embryos with newly designed (mpped2, ddx1) or previously validated (casp9 [54]) morpholino antisense oligonucleotides (MO, GeneTools, Philomath OR) at the one-cell stage at various doses. We fixed embryos in 4% PFA at the appropriate stages for in situ hybridization (http://zfin.org/ZFIN/Methods/ThisseProtocol.html). Different anatomic regions of the kidney were visualized using a panel of 4 established markers: pax2a (global kidney marker) [15], nephrin (podocyte marker) [16], slc20a1a (proximal tubule) [17], and slc12a3 (distal tubule marker) [17]. Abnormalities in gene expression were independently scored by two investigators. We compared the number of abnormal morphant embryos to control embryos, injected with a standard control MO designed by GeneTools, with the Fisher's exact test, at the Bonferroni-corrected significance level of 0.0125, i.e.: 0.05/4 markers. We documented the development of gross edema at 4 and 6 days post-fertilization in live embryos. We performed dextran clearance experiments following previously described protocols [55]. Briefly, 80 hours after MO injection, we anesthetized embryos in 4 mg/ml Tricaine in embryo water (1∶20 dilution), then positioned embryos on their back in a 1% agarose injection mold. We injected an equal volume of tetramethylrhodamine dextran (70,000 MW; Invitrogen) into the cardiac sinus venosus of each embryo. We then returned the embryos to fresh embryo water. Using fluorescence microscopy, we imaged the embryos at 2 hours post-injection (82 hpf) to demonstrate equal loading, then at 48 hours post-injection (128 hpf) to evaluate dextran clearance. Embryos were injected with control, mpped2, or casp9 MOs at the one-cell stage. At 48 hpf, embryos were manually dechorionated, anesthetized in a 1∶20 dilution of 4 mg/ml Tricaine in embryo water, and oriented on a 1% agarose injection mold. As previously described [56], embryos were injected with equal volumes of 10 mg/ml gentamicin (Sigma) in the cardiac sinus venosus, returned to fresh embryo water, and subsequently scored for edema (prevalence, time of onset) over the next 3 days. Supporting Information Figure S1 Flowchart of the project. (TIF) Click here for additional data file. Figure S2 Genome-wide −log10 P values plot from stage 1 discovery meta-analysis. Plots show the discovery analysis of eGFRcrea in the overall group, with known loci [8], [9] highlighted in orange and novel loci highlighted in blue (A), and in strata of the main CKD risk factors (B, C, D, and E), with complementary groups being contrasted each other. The dotted line indicates the genome-wide significance threshold at P value = 5×10−8. The unmarked locus is RNASEH2C on chromosome 11, colored in gray despite genome-wide significance. The P value for the current stage 1 discovery for rs4014195 was 2.7×10−9. This locus previously did not replicate [9]; when we additionally considered our prior non-overlapping in silico and de novo replication data, the current stage 2 P value was 0.8832, yielding a combined stage 1+stage 2 P value of 2.6×10−7. Therefore, we did not submit this SNP for further replication. (PDF) Click here for additional data file. Figure S3 Quantile-quantile plots of observed versus expected −log10 P values from the discovery analysis of eGFRcrea overall (A) and by strata of the main CKD risk factors (B). The orange line and its 95% confidence interval (shaded area) represent the null hypothesis of no association. In panel (A), results are compared when considering all SNPs (black dots) and when removing SNPs from loci that were already reported in previous GWAS [8], [9] (orange dots). The meta-analysis inflation factor λ is reported along with the discovery sample size. Individual-study minimum, maximum and median λs are also reported for comparison. Genomic-control correction was applied twice: on individual study results, before the meta-analysis, and on the meta-analysis results. (PDF) Click here for additional data file. Figure S4 Regional association plots for the six new loci in the European ancestry discovery samples: (A) MPPED2; (B) DDX1; (C) SLC47A1; (D) CASP9; (E) CDK12; (F) INO80. −log10 P values are plotted versus genomic position(build 36). The lead SNP in each region is labeled. Other SNPs in each region are color-coded based on their LD to the lead SNP(LD based on the HapMap CEU, see color legend). Gene annotations are based on UCSC Genome Browser(RefSeq Genes, build 36) and arrows indicate direction of transcription. Graphs were generated using the stand-alone version of LocusZoom [57], version 1.1. (PDF) Click here for additional data file. Figure S5 Forest plots of the six novel loci in the discovery phase. (TIF) Click here for additional data file. Figure S6 Results from discovery meta-analysis of eGFRcrea for the six new loci: overall sample and all strata are considered. Reported is the effect size on log(eGFRcrea) and its 95% confidence interval. The stratum where the SNP was discovered is marked with a triangle for discovery based on meta-analysis P value or with a circle for discovery based on direction test. (TIF) Click here for additional data file. Figure S7 Regional association plots for the six new loci in the African ancestry CARe samples: (A) MPPED2; (B) DDX1; (C) SLC47A1; (D) CASP9; (E) CDK12; (F) INO80. −log10 P values are plotted versus genomic position (build 36). The lead SNP in each region is labeled and identified by a blue arrow and blue P value. The SNP with the smallest P value in the region is indicated by a red arrow. Other SNPs in each region are color-coded based on their LD to the lead SNP (based on the HapMap YRI, see color legend). Gene annotation is based on UCSC Genome Browser (RefSeq Genes, build 36) and arrows indicate direction of transcription. Graphs were generated using the stand-alone version of LocusZoom [57], version 1.1. (PDF) Click here for additional data file. Figure S8 Ddx1 knockdown does not affect kidney gene expression. (A–E) Uninjected control embryos show normal kidney development as demonstrated by in situ hybridization for the renal markers pax2a (A, B), nephrin (C), slc20a1a (D) and slc12a3 (E). (F–J) Ddx1 morpholino(MO)-injected embryos do not show significant changes in renal marker expression. (K) Number of observed abnormalities per number of embryos examined at 400 uM MO injection for renal gene expression analysis. (TIF) Click here for additional data file. Figure S9 Casp9 and mpped2 knockdown embryos are more susceptible to gentamicin-induced kidney injury. Compared to control embryos (A), casp9 and mpped2 knockdown embryos develop edema at 103 hpf (C, E), suggestive of a renal defect. When injected with gentamicin, a nephrotoxin that reproducibly induces edema in control embryos (B), mpped2 and casp9 knockdown embryos develop edema earlier, more frequently, and in a more severe fashion (D, F). Whereas control embryos primarily develop cardiac edema, mpped2 and casp9 knockdown embryos display cardiac (arrowhead), ocular (black arrow), and visceral (white arrow) edema, demonstrating that mpped2 and casp9 knockdown predisposes embryos to kidney injury. (G) Quantification of edema prevalence in control, mpped2, and casp9 knockdown embryos 2, 22, and 55 hours post-injection (hpi) of gentamicin. These numbers are presented graphically in Figure 2X. (TIF) Click here for additional data file. Figure S10 Comparison of the effect size on eGFRcrea and on eGFRcys for the lead SNPs of known and new loci. Results are based on the largest sample size available for each locus, i.e. the combined discovery and replication sample for the novel loci (N = 130,600), the discovery sample only for the known loci (N = 74,354). Sign of effect estimates has been changed to reflect the effects of the eGFRcrea lowering alleles. Original beta coefficients and their standard errors for the two traits can be downloaded from the File S1. (TIF) Click here for additional data file. Figure S11 Odds ratios (ORs) and 95% confidence intervals of CKD and CKD45 for the lead SNPs of all known and new loci, sorted by decreasing OR of CKD. (TIF) Click here for additional data file. File S1 Effect size on eGFRcrea and on eGFRcys for the lead SNPs of known and new loci. (XLSX) Click here for additional data file. Table S1 Study-specific methods and full acknowledgments—discovery studies. (DOC) Click here for additional data file. Table S2 Study-specific methods and full acknowledgments—replication studies and functional follow-up studies. (DOC) Click here for additional data file. Table S3 Characteristics of stage 1 discovery studies. (DOC) Click here for additional data file. Table S4 Study-specific genotyping information for stage 1 discovery studies. (DOC) Click here for additional data file. Table S5 Characteristics of stage 2 replication studies. (DOC) Click here for additional data file. Table S6 Study-specific genotyping information for stage 2 in silico replication studies. (DOC) Click here for additional data file. Table S7 Top four SNPs from the CKD45 analysis. (DOC) Click here for additional data file. Table S8 Loci identified by the test for differential effects between strata in the GWAS. Results are sorted by trait, group and chromosome. For each SNP, the P value of the test for difference between strata is reported. (DOC) Click here for additional data file. Table S9 Imputation quality of replicated SNPs in all discovery and replication studies: median MACH-Rsq and interquartile range (IQR) are reported. (DOC) Click here for additional data file. Table S10 Effects of novel and known loci on log(eGFRcrea) in the overall population. (DOC) Click here for additional data file. Table S11 Genes nearest to loci associated with renal traits. (DOC) Click here for additional data file. Table S12 Imputation Quality (MACH-Rsq) for the best SNPs in the African ancestry samples of the CARe consortium (1.00 refers to genotyped data). (DOC) Click here for additional data file. Table S13 Baseline characteristics of the kidney biopsies for the eQTL analysis. (DOC) Click here for additional data file. Table S14 Analysis of the new loci for eQTL status in meta-analysis of two cohorts of kidney biopsies. (DOC) Click here for additional data file. Table S15 Association of novel and known loci with CKD and CKD45: Odds Ratios (OR), 95% confidence intervals (95%CI) and P values. (DOC) Click here for additional data file. Table S16 Association between novel and known loci and log(eGFRcrea) in individuals without and with diabetes and test for difference between strata. (DOC) Click here for additional data file. Table S17 Association between novel and known loci and log(eGFRcrea) in individuals without and with hypertension and test for difference between strata. (DOC) Click here for additional data file. Table S18 Association between novel and known loci and log(eGFRcrea) in individuals younger and older than 65 years and test for difference between strata. (DOC) Click here for additional data file. Table S19 Association between novel and known loci and log(eGFRcrea) in females and in males and test for difference between strata. (DOC) Click here for additional data file. Table S20 Effects of novel loci on the logarithm of urinary albumin-to-creatinine ratio (log(UACR)) in the overall sample and by diabetes and hypertension status. (DOC) Click here for additional data file. Table S21 Effects (log odds ratios) of novel loci on microalbuminuria (MA) in the overall sample and by diabetes and hypertension status. (DOC) Click here for additional data file. Table S22 Association of novel loci with diastolic and systolic blood pressure in the ICBP consortium. (DOC) Click here for additional data file. Table S23 Association of novel loci with myocardial infarction in the CARDIoGRAM consortium. (DOC) Click here for additional data file.