Introduction 560,000 cases of upper aerodigestive tract (UADT) cancers (encompassing of the oral cavity, pharynx, larynx and esophagus) are estimated to occur each year world-wide [1]. Exposure to alcohol and tobacco [1] are the major UADT cancer risk factors in Europe and the Americas, with infection with human papillomavirus also playing an important role [2]. Elevated familial relative risks are consistently reported for UADT cancers [3]-[7]. While this implies that genetics contributes to UADT cancer susceptibility, the identity of the specific genes involved remains unclear. Studies of common genetic variation and UADT cancer susceptibility have mostly employed a candidate gene approach, with a particular focus on the genes that metabolize alcohol [8]. The metabolism of alcohol releases the carcinogen acetaldehyde as an intermediate [9]. As genetic variation in alcohol metabolism genes appears to influence their rate of function [10], [11], variants that lead to a relative increase in exposure to acetaldehyde are expected to confer carriers to an increased risk of UADT cancers [12]. Consistent with this hypothesis, genetic variation in the alcohol dehydrogenase (ADH) 1B, and the aldehyde dehydrogenase 2 (ALDH2) genes in Asian populations have been associated with UADT cancer risk [8], [12], [13]. Three independent variants ADH1B, ADH7 and ADH1C variants have also been associated with UADT cancer risk in European populations [14]. Common genetic variation in additional genetic pathways have also been considered, although with some exceptions, such as DNA repair [15], [16], the results have been inconsistent [3]. The candidate gene based studies have tested only a very small proportion of common human genetic variation in relation to UADT cancer risk. To further investigate common genetic variation and susceptibility to UADT cancers, we have performed a genome-wide association study within the International Head and Neck Cancer Epidemiology (INHANCE) consortium, comprising genome wide analysis of 2,091 UADT cancer cases and 8,334 controls and replication analysis of the nineteen top ranked variants in an independent series consisting of 6,514 UADT cancer cases and 7,892 controls from thirteen additional studies. Results Genome-wide results After exclusion of suboptimal DNA based on QC criteria, data from 2,091 cases and 3,513 study specific controls and 4,821 generic controls were available for statistical analyses (Table S1) with 294,620 genetic variants. The overall results did not show a large deviation from what was expected by chance (λ = 1.07) (Figure 1). One genetic variant, rs971074, was strongly associated with UADT cancers (p 0.8) was included. We additionally included the non-synonymous ADH1B variant, rs1229984, that has been previously associated with UADT cancers [14] but not genotyped or tagged by a proxy variant on the HumanHap300 BeadChip. The association between the top ranked genetic variants selected for replication and UADT cancer was not sensitive to adjustment for population structure using principal component analysis, or exclusion generic controls (Table S2). rs1573496 was genotyped for replication as a proxy for rs971074 (r2 = 1.00) and rs698 for rs1789924 (r2>0.97) due to availability of Taqman assays. A TaqMan assay for rs12827056 could not be designed and no highly correlated (r2>0.95) proxy genetic variant was available, hence further investigation was not possible. 10.1371/journal.pgen.1001333.t001 Table 1 Results from the UADT cancer genome-wide and replication analysis. Alleles Discovery phasea Replication phaseb; f Combineda Marker Chromosome region ref rare Reason for replication attempt OR 95% CI Pts OR 95% CI Pts OR 95% CI Pts rs1229984 4q23 C T ADH1B, candidate gene 0.52 0.43–0.64 7×10−11 0.68f 0.60–0.78 7×10−9 0.64 0.59–0.71 1×10−20 rs971074c 4q23 G C p_all≤1×10−5 0.70 0.62–0.79 8×10−9 0.78f 0.72–0.86 5×10−8 0.75 0.70–0.80 9×10−17 rs1494961 4q21 T C non-synonymous and p≤1×10−4 1.15 1.07–1.24 1×10−4 1.11 1.06–1.17 2×10−5 1.12 1.08–1.17 1×10−8 rs4767364 12q24 G A p_all≤1×10−5 1.21 1.12–1.32 2×10−6 1.10 1.04–1.15 4×10−4 1.13 1.08–1.18 2×10−8 rs1789924c 4q23 T C p_all≤1×10−5 1.20 1.11–1.29 2×10−6 1.07f 1.01–1.14 0.02 1.12 1.07–1.17 3×10−7 rs1431918 8q12 G A p_all≤1×10−5 1.19 1.10–1.28 7×10−6 1.05 1.00–1.11 0.05 1.09 1.04–1.14 7×10−5 rs7431530 3p24 C T p_all≤1×10−5 0.81 0.74–0.88 2×10−6 0.95 0.90–1.00 0.06 0.91 0.87–0.95 5×10−5 rs3810481 20q13 G A non-synonymous and p≤1×10−4 1.22 1.11–1.34 6×10−5 1.07 0.99–1.15 0.09 1.12 1.06–1.19 2×10−4 rs10801805 1p22 G A p_all≤1×10−5 1.20 1.11–1.29 3×10−6 1.04 0.98–1.10 0.15 1.09 1.04–1.14 2×10−4 rs1041973 2q12 C A non-synonymous and p≤1×10−4 0.83 0.76–0.90 3×10−5 0.94 0.89–1.00 0.05 0.91 0.87–0.95 9×10−5 rs4799863 18q12 A G p_all≤1×10−5 0.84 0.78–0.91 5×10−6 0.96 0.92–1.01 0.12 0.92 0.89–0.96 1×10−4 rs2517452d 6p21 C T p_oral 0.97) rs1573496 and rs698. d Analysis considered oral cancers only. e Analysis considered heavy drinkers only. f For 4q23 variants rs1229984, rs1573496, rs698, the replication phase excluded the SA Latin American study (Table 4) that had been published previously. Pts: two-sided p-value. Replication and combined results Five genetic variants at three loci, 4q21, 4q23 and 12q24, were significantly associated with UADT cancer risk in the replication series (assuming Bonferroni correction for 19 comparisons or p≤0.003, or p = 0.05 for previously described variants) or in the combined analysis (p-value of ≤5×10−7) (Table 1) (Figure S2). Using imputed genotypes across the 4q21, 4q23 and 12q24 regions based on Caucasian individuals from the HapMap consortium, we did not identify any variants more strongly associated with UADT cancer risk than the SNPs genotyped on the beadchips directly (Figure 2). 10.1371/journal.pgen.1001333.g002 Figure 2 Imputation and LD patterns. Imputation and LD patterns across the (a) 4q23 (ADH loci), (b) 12q24 (ALDH2), and (c) 4q21 (HEL308). Upper panel: Single marker association results for imputed (green) and directly genotyped variants (blue). Imputation performed on 2,091 cases and 3,513 study specific controls (excluded generic controls). After adjustment for the five variants that presented with replication, no variant had a p 0.95) with more than 20 common genetic variants. This region contains additional genes (Figure 2), notably a second DNA repair-related gene, FAM175A (or Abraxas and CCDC98), that interacts directly with the BRCT repeat region of BRCA1 [29]. That a comparable association was noted between this variant and lung cancer (p = 3×10−4) (Figure 4) suggests that the causal variant maybe relevant for cancers influenced by tobacco consumption in general. 4q23 The top two ranked variants (rs1573496 and rs698 and correlated variants) from the GWAS stage we have previously associated with UADT cancer risk [14]. The association between these variants, and a third variant, rs1229984, not included in the Humanhap300 beadchip but genotyped here based on our previous findings [14], and UADT cancer was independently replicated in the additional UADT cases and controls presented here (p = 1×10−7, 1×10−8 and 0.01 for rs1573496, rs1229984 and rs698, respectively). The combined sample series presented here, totaling 8,774 UADT cancer cases and 11,982 controls, allowed further exploration of these genetic effects among UADT cancer subsites and strata defined by gender, drinking and smoking. The effects of these three variants were generally present for each UADT sub-sites but more pronounced in esophageal cancers and males (Figure 3). Strong heterogeneity was found with rs1229984 when stratifying by alcohol consumption. Notably, an association was observed in “Ever drinkers-Never smokers”, but not in “Never drinkers-Ever smokers”, suggesting the effect with the rs1229984 variant is mediated through alcohol drinking rather than tobacco smoking. In contrast, the lack of heterogeneity for rs1573496 when stratifying by alcohol use may imply differences in the mechanism of carcinogenesis among these ADH variants. Several studies have suggested rs1229984 may influence alcohol consumption behaviour [30]-[33]. We have strongly replicated this association (p = 3×10−20). Similarly, minor allele carriers of rs1573496 and rs698 also consumed different amounts of alcohol compared with non-carriers (Table 2). Comparable to the observations made between 15q25 variants, propensity to smoke and lung cancer [34]-[36], adjustment for alcohol consumption did not fully explain the UADT cancer association with these variants (Table S4) suggesting, at least within the limits of this measurement of alcohol consumption, that these risks are unlikely to be explained by alcohol consumption behaviour patterns. In conclusion, this study has identified two novel variants robustly associated with UADT cancers, and independently replicated three variants previously identified. All five variants variants are positioned near genes that appear relevant to etiology of UADT cancers, although further work is needed to identify the causative allele and gene at these loci. Materials and Methods Discovery phase study samples Genome-wide genotyping was performed in two European based multi-centre UADT cancer case-control studies (Table 4), the International Agency for Research on Cancer (IARC) central europe study [14], [37], [34] conducted from 2000 to 2002, in 6 centers from 5 countries; and the ARCAGE [14], [34], [38] ( A lcohol- R elated C ancers a nd G enetic susceptibility in E urope) multicentre case control study conducted by IARC from 2002 to 2005 in 12 centers from 9 European countries. DNA of sufficient quality and quantity for genome-wide genotyping was available for 2,230 UADT cancer cases (squamous cell carcinomas) and 4,090 controls from these two studies. We additionally included 4,983 generic controls to further increase statistical power. These generic controls included: 1,385 individuals from the 1958 birth cohort, (Wellcome Trust case control consortium[39]) as well as 1,823 French and 433 Norwegian controls genotyped by the Centre National Genotypage (CNG Evry France). We also included in our control series a separate group of 1,342 kidney cancer cases from the same centres as the central Europe study, inclusion or exclusion of these “controls” had no material effect on the results presented (Table S2). Both studies have been approved by local ethics committees as well as IARC IRB. 10.1371/journal.pgen.1001333.t004 Table 4 The 15 UADT cancer studies participating in the genome-wide and replication analysis. Study Name Study setting Coordinating centre Genotyping centre Principal Investigators UADT Subsitese Control source Casesa Controlsa Cases Controls GWAS Post GWAS Qc ARCAGE b Europe - Multicentre IARC CNG Boffetta/Brennan UADT Hospital-based 1,422 1,503 1,368 1,313 Central Europe c Europe - Multicentre IARC CNG Boffetta/Brennan UADT Hospital-based 808 2,587 723 2,200 Generic controls 4,821 Replication SAd Latin America - Multicentre IARC IARC Boffetta/Brennan UADT Hospital-based 1,422 1,098 ARCAGE - Bremen Bremen -Germany Bremen Uni. IARC Ahrens UADT Hospital-based 164 190 Rome Roma - Italy Uni. Rome IARC Boccia HN Hospital-based 251 237 Poland Szczecin - Poland Szczecin Uni IARC Lubinski Larynx Hospital-based 409 1,039 Seattle (Oral Gen study) Washington- US Fred Hutchinson Cancer Research Centre FHCRC Schwartz/Chen HN Population-based 193 388 University of North Carolina (CHANCE study) North Carolina - US University of North Carolina University of North Carolina Olshan HN Population-based 940 1,087 Penn State Tampa - US Penn State University Penn State University Muscat/Lazarus Hospital-based 310 534 Philadelphia, New York City - US Lazarus HN UCLA Los Angeles - US University of California, LA University of California, LA Zhang UADT Population-based 206 577 MD Anderson Houston - US MD Anderson Cancer Centre MD Anderson Cancer Centre Wei/Sturgis HN Hospital-based 431 431 IARC - oral cancer (ORC) Europe - Multicentre IARC IARC Franceschi Oral Hospital-based 611 643 Boston (HNSCC) Boston - US Brown Uni. Brown Uni. Kelsey HN Population-based 513 593 University of Pittsburgh (SCCHN-SPORE) Pittsburgh - US University of Pittsburgh IARC Romkes HN Hospital-based 610 771 The Netherlands Maastricht Hospital - Netherlands University St Radboud Lacko/Peters HN Hospital-based 454 304 Total 8,744 11,982 a Including only individuals of self-reported European ancestry. b Includes countries: Czech Republic, Greece, Italy, Norway, UK, Spain, Croatia, Germany, France. c Includes countries: Romania, Poland, Russia, Slovakia, Czech Republic. d For the three variants at 4q23, results have been published previously, in “replication” analysis for these variants, the SA study was excluded. e UADT –Oral, pharynx, laryngeal, esophageal cancers, HN – Head and neck cancers Oral, pharynx, laryngeal cancers. Genome-wide genotyping and quality control The central Europe study and the ARCAGE study were genotyped using the Illumina Sentrix HumanHap300 BeadChip at the Centre d'Etude du Polymorphisme Humain (CEPH) and the CNG as described previously [34], [40]. We conducted systematic quality control steps on the raw Illumina HumanHap300 genotyping data. Variants with a genotype call rate of less than 95% and also individuals where the overall genotype completion rate was less than 95% were excluded. We also conducted further exclusions where the genotype distribution clearly deviated from that expected by Hardy-Weinberg Equilibrium (HWE) among controls (p-value of less than 10−7) and where there were discrepancies between sex based genotype and reported sex, as well as individuals with unlikely heterozygosity rates across genetic variants on the X chromosome (Table S1). Those genotyped were restricted to individuals of self – reported European ethnicity. To further increase the ethnic homogeneity of the series, we used the program STRUCTURE [41] to identify individuals of mixed ethnicity. Using a subseries of 12,898 genetic variants from the HumanHap 300 BeadChip panel evenly distributed across the genome and in low linkage disequilibrium (LD) (r2 median) drinkers and heavy (>median) smokers. The potential for population stratification not accounted for by adjustment by country was also investigated by principal components analysis (PCA) undertaken with the EIGENSTRAT package [44] using 12,898 markers in low LD [42]. Adjustment for population stratification using the PCA was performed by including significant eigenvectors that were associated with case control status (p<0.05) as covariates in the logistic regression. Genotypes for genetic variants across 4q21, 4q23 and 12q21 not genotyped on the Illumina HumanHap300 BeadChip, but genotyped by the HAPMAP consortium, were imputed using the program MACH with phased genotypes from the CEU Hapmap genotyping as a scaffold. Unconditional logistic regression using posterior haplotype probabilities (haplotype dosages) from MACH were carried out using ProbABEL [45] including age, sex, and country of origin in the regression as covariates. Linkage Disequilibirum (LD) statistics (D' and r2) were calculated using Haploview [46]. Replication study samples The replication series consisted of 6,514 UADT cancer cases (squamous cell carcinomas) and 7,892 controls from 13 UADT cancer case-control studies (Table 4). With the exception of the Szczecin case-control study [16], all studies were part of the INHANCE consortium. As previously described [1], [3], [47], all INHANCE studies have extensive information on tumor site and histology, as well as lifestyle characteristics. The Szczecin, Seattle, UCLA and MD Anderson studies were only able to genotype a proportion of the variants (Table S5). Results for the three ADH variants, rs1229984, rs1573496 and rs698 have been published previously for the Latin American study (LA). For these variants, in “replication” analysis the Latin American study was excluded. All studies have been approved by local ethics committees as well as IARC IRB. Replication genotyping Replication genotyping was performed using the TaqMan genotyping platform in 8 participating genotyping laboratories (Table 4). The robustness of the Taqman assays (primers and probes are available upon request) were confirmed at IARC by re-genotyping the CEPH HapMap (CEU) trios and confirming concordance with HapMap genotypes. Any discordance between Hapmap and Taqman generated genotypes was resolved by direct DNA sequencing. All Taqman assays were found to be performing robustly. IARC supplied Taqman assays and a standardized Taqman genotyping protocol to each of the 8 participating genotyping laboratories. A common series of 90 standard DNAs were genotyped at each laboratory to ensure the quality and comparability of the genotyping results across the different studies. Concordance with the consensus genotype and the results produced at the eight genotyping laboratories for the standardized DNAs was 99.75%, and no individual centre had a overall concordance of less than 99.5%. If the assay produced 2 or more discordant genotypes relative to the consensus, the study genotypes for this genetic variant were not included in the statistical analysis. Assays that had a per-centre success rate of <90% or for which genotype distributions deviated from HWE (p<0.001) were also excluded (Table S5). Replication statistical analysis The association between the nineteen variants and UADT cancer risk was estimated by per allele ORs and their 95% CI derived from multivariate unconditional logistic regression, with age, sex, and study (and country of origin where appropriate) included in the regression model as covariates. Measures of alcohol consumption have been previously harmonized across INHANCE studies [48]. The association between ADH/ALDH2 variants and alcohol consumption was carried out in ever drinkers using multivariate linear regression using a log transformed milliliter of ethanol consumed per day as an outcome, adjusting for age, sex, study, packyears (and case-control status when appropriate). Milliliters of ethanol consumed per day was not available for 3 studies (Szczecin, Philadelphia/New York and The Netherlands study). Heterogeneity of ORs across the studies and across the stratification groups was assessed using the Cochran's Q-test. All replication and combined analyses were conducted using SAS 9.1 software. P values were two sided. Investigation of the effects of 4q21 variant rs1494961 and lung cancer risk The series of lung cancer cases and controls used to investigate 4q21 variant, rs1494961, and lung cancer risk included studies from central Europe (IARC), Toronto (McGill), HUNT2/Tromso, the CARET cohort, EPIC-lung, the Szczecin case-control study, Liverpool Lung Project (LLP), Paris France and Estonia as described previously [34], [40], [49]. All studies have been approved by local ethics committees as well as IARC IRB. Genotyping protocol for 4q21 variant, rs1494961 Genotyping for rs1494691 was performed using the Illumina beadchips (Central Europe (IARC), Toronto (McGill), HUNT2/Tromso, the CARET cohort, France and Estonia) or the Applied Biosystems Taqman assays (EPIC-lung, the Szczecin case-control study, Liverpool Lung Project (LLP)) at IARC. For the central European lung cancer study, the controls overlapped with the central European UADT cancer study for Bucharest (Romania), Lodz (Poland), Moscow (Russia), Banska Bystrika (Slovakia), and Olomouc and Prague (Czech Republic). We therefore performed analyses both including and excluding centres where controls overlapped. Web resources http://inhance.iarc.fr/ (December 2010) http://www.hapmap.org (December 2010) http://www.sph.umich.edu/csg/abecasis/mach/index.html (December 2010) Supporting Information Figure S1 Strategy for discovery and replication in the genome-wide association study. (0.17 MB DOC) Click here for additional data file. Figure S2 Analysis of selected variants by study and by UADT cancer site in the replication series. For replication estimates of rs1229984, rs1573496, rs698, the SA study was excluded. (0.26 MB DOC) Click here for additional data file. Figure S3 STRUCTURE Admixture plots. Individuals plotted against individuals of known Caucasian (CEU), African (YRI) and East Asian (JPT-CHB) origin. Individuals with greater than 30% admixture (dashed line) were excluded. (0.30 MB DOC) Click here for additional data file. Table S1 Exclusion criteria of subjects for GWAS. (0.18 MB DOC) Click here for additional data file. Table S2 Sensitivity analysis on the top variants identified by the genome-wide analysis. (0.24 MB DOC) Click here for additional data file. Table S3 Selected demographic characteristics of cases and controls (GWAS and replication data combined). (0.20 MB DOC) Click here for additional data file. Table S4 Comparison between analysis adjusted and unadjusted on tobacco and alcohol consumption. (0.16 MB DOC) Click here for additional data file. Table S5 Minor allele frequency of each variant per study. (0.22 MB DOC) Click here for additional data file.