Introduction Autoimmune disorders, including rheumatoid arthritis (RA) and celiac disease (CD), affect about 5% of the population and have a complex genetic background. Family-based epidemiology studies suggest that there is a shared genetic basis between the two autoimmune diseases . Recent genome-wide association studies (GWAS) have confirmed HLA and identified at least 26 other non-HLA genetic loci with common alleles associated to each disease (Table S1 and S2) , . The strongest genetic risk factor is the HLA locus , , where different alleles confer risk of the two diseases. Six other risk loci outside of the HLA locus are shared between CD and RA and include MMEL/TNFRSF14 , , REL , , , ICOS-CTLA4 , , , , IL2-IL21 , , , , , TNFAIP3 , , , , and TAGAP , , , (Chen et al, submitted) (Table 1 and Figure 1). These shared risk loci have emerged by simple cross-comparison across published studies, rather than a rigorous and systematic analysis of an integrated dataset. Because of the nature of these reports, it is unknown whether the other CD and RA risk alleles confer risk of both diseases. Moreover, it is unknown whether there are additional shared risk alleles that have not yet been discovered in any one disease. 10.1371/journal.pgen.1002004.g001 Figure 1 Established CD and RA SNPs and their association across diseases. (A) Known CD SNPs in RA. The figure represents OR and CI for the established CD SNPs (p 0.9) in both diseases; in other three loci – CTLA4, REL and TAGAP – the most associated SNPs in CD and RA are not in strong LD with each other (r2 50,000 case-control samples. As more samples and SNPs are genotyped between these diseases, additional risk alleles will be discovered. Third, we did not attempt to fine-map the 26 established risk loci for both autoimmune diseases to determine if a single allele is responsible for risk in both autoimmune diseases. And fourth, we made no attempt to search for low-frequency or rare variants that are shared between RA and CD. Implementation of newer sequencing technologies will be required to search for rare risk variants. In summary, our study adds four novel loci to established RA and CD risk loci (CD247, UBE2L3, DDX6, and UBASH3A). It also adds four loci previously established in one or the other disease to the list of shared CD-RA risk loci (SH2B3, 8q24.2, STAT4, and TRAF1-C5). With six previously established CD-RA risk loci, there are now 14 shared CD-RA risk loci, out of 50 established loci for either of the two autoimmune diseases. We emphasize that these are conservative estimates of shared risk loci between the two diseases, as our study may be underpowered to detect common alleles of modest effect size, and we have not considered genetic models in which different alleles within one locus contribute to risk of the two diseases. In addition to the HLA associations, these shared risk loci clearly point to the critical role of antigen presentation via MHC class II molecules to the T-cell receptor, and subsequent activation and differentiation of T-cells in shared disease pathogenesis. Materials and Methods Ethics statement Institutional review boards at each collection site approved the study, and all individuals gave their informed consent. Sample collection CD GWAS dataset CD case-control GWA study included 15,283 individuals (4533 cases, 10750 controls) from 5 populations: Finnish (FIN) (674 cases, 647 controls), Italian (IT) (541 cases, 497 controls), Dutch (NL, 876 cases, 803 controls), and two collections from UK population, UK1 (737 cases, 2596 controls) and UK3 (1922 cases, 1849 controls) (described in details in).The genotyping of all cohorts except UK1 cases was done on Illumina platforms including 550K SNPs (either Illumina Hap550, or Illumina 610 or 670 Quad, or Illumina 1.2 M). The genotyping of UK1 cases (n = 737) was done on Illumina 317K arrays. The subset of SNPs successfully genotyped on Illumina 550 and Quadr platforms, but not on Hap300 platform (n = 196860) was further imputed in the UK1 dataset, using Plink and HapMap Phase 2 European CEU founders as a reference panel . RA GWAS dataset The RA meta-analysis includes 5,539 autoantibody positive RA cases and 17,231 controls of European ancestry as described previously . This study comprises six GWAS case-control collections, genotyped on various platforms. The imputation was conducted on GWAS genotype data for each GWAS collection separately, using the IMPUTE software  and haplotype-phased HapMap Phase 2 European CEU founders as a reference panel. In total, 2.56 million SNPs were imputed. Identity by state (IBS) analysis was run on controls from both CD and RA GWAS datasets. The overlapping controls genotyped in both CD and RA datasets were excluded from the RA analysis. Replication cohorts The replication cohorts included 2,169 CD cases and 2,255 controls, and 2,845 antibody-positive RA cases and 4,944 controls. The CD replication cohorts included three case-control collections from Ireland, Italy and Poland; all collections were geographically matched and are described previously . The five RA replication collections included (1) CCP or RF positive Dutch cases from Groningen and Nijmegen, together with geographically matched controls (Replication cohort 1, R1); (2) CCP positive white individuals from North America (Replication cohort 2, R2; this collection is called i2b2); (3) North American RF positive cases and controls matched on gender, age, and grandparental country of origin from the Genomics Collaborative Initiative (GCI, Replication cohort 3, R3); (4) CCP or RF positive Dutch cases and controls from Leiden University Medical Center (LUMC; Replication cohort 4, R4); and (5) CCP positive cases drawn from North American clinics and controls from the New York Cancer Project (together this collection is called NARAC-II), matched on ancestry informative markers data (Replication cohort 5, R5). All cohorts except i2b2 were described in detail in , whereas i2b2 is described in . Summary information on these samples is presented in Table S11. Genotyping Replication analysis of 15 SNPs was performed on the Sequenom iPlex platform in three centers – (1) Broad institute (all CD cases and controls, and RA replication cohorts R1 and R2); (2) Celera Diagnostics (Alameda California, USA; RA replication cohort R3 and R4); and (3) National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS, RA replication cohort R5). (See Table S11 for details). If the SNPs could not be designed into the iPLEX pool, then a proxy SNP was included. Information on the iPLEX design, proxies and cohorts genotyped in different centers is presented in Tables S11 and S12. We excluded SNPs in each replication collection if they were missing >10% genotype data, 0.1 in the HapMap2 reference panel. Pairwise LD tables were generated from the HapMap2 release 24 phased haplotype data distributed with the IMPUTE software; r2 values were calculated for all SNPs within 1 Mb of each other. For a given analysis, the most strongly associated SNPs (after removing 1 Mb around known associated SNPs) in one disease were retained. We also filtered for SNPs with P 0.1) with any of the 1000 genome SNPs located within Illumina probe sequences. Supporting Information Figure S1 Manhattan plots for GWAS analysis. Manhattan plot for GWAS in CD (A), RA (B), CD-RA with directional meta-analysis (C) and CD-RA with opposite allelic effect (D). Figure E is a combined picture of the four analyses. Yellow – CD; blue – RA; green – CD-RA directional; purple – CD-RA opposite allele. In figure E, the 14 shared CD-RA loci are annotated. The HLA locus (chr6: 20–40 MB) is excluded from the analysis. In RA, three SNPs in PTPN22 locus are associated at p<10-24 and therefore are excluded from the plot for the purpose of scale. The PTPN22 SNPs excluded from RA plot are: rs2476601 P(RA) = 3.6×10−68, rs2358994 P(RA) = 8.2×10−31, and rs1230661 P(RA) = 6.1×10−25. (PDF) Click here for additional data file. Figure S2 Cis eQTL genotype – expression correlation analyses in associated SNPs. Individual level gene expression data (residual variance after Transcriptional Components removed) from 1469 PAXgene samples. Spearman Rank Correlation coefficients and P values are shown for HT-12 and Ref. 8 data and for meta-analysis results. Right Y axis, average expression rank, is a measure of how strongly the tested probe is expressed amongst all probes in the dataset. Unannotated probes are manually plotted and localized to the following transcripts: Probe 2810202 – PHF19, Probe 6980470 – TMPRSS3, Probe 1230242 – UBE2L3. (PDF) Click here for additional data file. Table S1 Established CD SNPs and their association to RA. Candidate genes in the blocks are mentioned. Top P-value in CD is indicated as in the reference paper . CD_P column indicates the p-value in CD-GWAS dataset (4,533 cases and 10,750 controls); RA_P column indicates the p-value in RA-GWAS dataset (5,539 cases and 17,231 controls). OR is given for the minor allele. (DOC) Click here for additional data file. Table S2 Established RA SNPs and their association to CD. Candidate genes in the blocks are mentioned. Top P-value in RA is indicated as in the reference papers , , , , , , , , , . CD_P column indicates the p-value in CD-GWAS dataset; RA_P column indicates the p-value in RA-GWAS dataset; OR is given for the minor allele. The CD results for RA SNPs that were not genotyped in CD GWAS (not present on Illumina Hap550 genotyping array), are either imputed or estimated from the best genotyped proxy SNP (indicated in column “genotyped/imputed”). When proxies were used, the r2 with RA SNP is indicated in column ‘r2 for proxy’. For imputed SNPs the imputation score is annotated in column ‘Imp. score’. * - the perfect proxy rs13017599 was genotyped in the reference paper. + - the association in TNFRSF14 does not reach P<5×10−8, however this locus was included as RA-established locus as it was implicated in several independent studies , , . (DOC) Click here for additional data file. Table S3 Distribution of CD associated SNPs (P<0.001) in RA dataset and RA associated SNPs (P<0.001) in CD datasets. 3a. Goodness of fit tests of no association, using Fisher's Rule for combining P-values. 3b Results of Kolmogorov-Smirnov and Wilcoxon rank sum tests for non-random distribution of associated SNPs across diseases. N – number of SNPs associated to CD and RA at p<0.001 after LD-pruning. Df – degrees of freedom. (DOC) Click here for additional data file. Table S4 SNPs associated to CD-RA with P<1×10-5 and CD and RA with P<0.01; directional analysis. The table includes all SNPs associated to CD-RA in directional GWAS meta-analysis with P(combined)<1×10-5 (column ‘CD-RA_P dir’) and P-value per diseases P<0.01 (columns ‘CD_P’ and ‘RA_P’). OR is given for the reference (minor) allele. (DOC) Click here for additional data file. Table S5 CD-RA meta-analysis results, directional analysis – one top SNP per loci. One most associated SNP per locus is indicated. OR is given for the reference (minor) allele. Cut-off for the P-values is the same as in Table S4. SNPs are sorted for the P-value in CDRA meta-analysis. Loci established in both diseases are indicated in bold. (DOC) Click here for additional data file. Table S6 SNPs associated to CD-RA with P<1×10-5 and CD and RA with P<0.01; analysis of opposite allelic effect. The table includes all SNPs associated to CD-RA in GWAS meta-analysis of opposite alleleic effect with P(combined) P<1×10-5 (column ‘CD-RA_P_opp’) and P-value per diseases P<0.01 (columns CD_P and RA_P). (DOC) Click here for additional data file. Table S7 CDRA meta-analysis results, opposite allelic effect – one top SNP per loci. One most associated SNP per locus is indicated. OR is given for the reference (minor) allele. Cut-off for the P-values is the same as in Table S6. SNPs are sorted for the P-value in CD-RA meta-analysis. Loci established in both diseases are indicated in bold. (DOC) Click here for additional data file. Table S8 GRAIL analysis of associated loci. The P-value for each candidate gene is based on the number of relationships to other associated genes listed in the third column. GRAIL is available at http://www.broadinstitute.org/mpg/grail/. (DOC) Click here for additional data file. Table S9 Shared CD-RA risk variants correlated with cis gene expression. ‘HT-12’ comprise 1240 individuals with blood gene expression assayed using Illumina Human HT-12v3 arrays, ‘Ref-8v2’ comprise 229 individuals with blood gene expression assayed using Illumina Human-Ref-8v2 arrays. ASpearman rank correlation of genotype and residual variance in transcript expression. Meta-analysis eQTL P value shown if both datasets had identical probes. See Figure S2 for detailed results and Materials and Methods for sample information and references. (DOC) Click here for additional data file. Table S10 Characteristics of 23 candidate genes in shared loci. (DOC) Click here for additional data file. Table S11 Information on replication cohorts. Case-control collections included to the replication step in CD (top panel) and RA (bottom panel). For each collection, we list the source of controls, geographic origin, autoantibody status of RA cases, numbers of cases and controls, genotyping platform and genotyping center, and the strategy used to correct for case-control population stratification. See Materials and Methods for additional details. NIAMS – National Institute of Arthritis and Musculoskeletal and Skin Diseases. (DOC) Click here for additional data file. Table S12 Information on proxies used in three iPLEX pools. R2 - r2 between GWAS and SNP pools used in replication step (HapMap CEU, as calculated in SNAP (http://www.broadinstitute.org/mpg/snap/)). A – iPLEX pool used in Broad institute (BI) was used to genotype all replication cohorts for celiac disease and replication cohorts R1 and R2 in rheumatoid arthritis, as indicated in Table S11. (DOC) Click here for additional data file.