Introduction Type 2 diabetes (T2D) affects at least 6% of the world's population; the worldwide prevalence is expected to double by 2025 [1]. T2D is a complex disorder that is characterized by hyperglycemia, which results from impaired pancreatic β cell function, decreased insulin action at target tissues, and increased glucose output by the liver [2]. Both genetic and environmental factors contribute to the pathogenesis of T2D. The disease is considered to be a polygenic disorder in which each genetic variant confers a partial and additive effect. Only 5%–10% of T2D cases are due to single gene defects; these include maturity-onset diabetes of the young (MODY), insulin resistance syndromes, mitochondrial diabetes, and neonatal diabetes [3]–[5]. Inherited variations have been identified from studies of monogenic diabetes, and have provided insights into β cell physiology, insulin release, and the action of insulin on target cells [6]. Much effort has been devoted to finding common T2D genes, including genome-wide linkage, candidate-gene, and genome-wide association studies (GWAS). Whole-genome linkage scans have identified chromosomal regions linked to T2D; however, with the exception of regions 1q [7]–[13] and 20q, which have been repeatedly mapped, linkage results vary from study to study [14]–[19]. Candidate-gene studies have provided strong evidence that common variants in the peroxisome proliferator-activated receptor-r (PPARG) [20], potassium inwardly-rectifying channel J11 (KCNJ11) [21]–[23], transcription factor 2 isoform b (TCF2) [24],[25], and Wolfram syndrome 1 (WFS1) [26] genes are associated with T2D. These genes all have strong biological links to diabetes, and rare, severe mutations cause monogenic diabetes. GWAS have accelerated the identification of T2D susceptibility genes, expanding the list from three in 2006 to over 20 genes in 2009. There are now at least 19 loci containing genes that increase risk of T2D, including PPARG [27], KCNJ11 [27], KCNQ1 [28],[29], CDKAL1 [27],[29]–[33], CDKN2A-2B [27],[32],[33], CDC123-CAMK1D [34], MTNR1B [35]–[37], TCF7L2 [31],[38],[39], TCF2 (HNF1B), HHEX-KIF11-IDE [27],[32],[33],[38], JAZF1 [34], IGF2BP2 [27],[29],[32], SLC30A8 [27],[32],[33],[38], THADA [34], ADAMTS9 [34], WFS1 [26], FTO [27],[31], NOTCH2 [34], and TSPAN8 [34]. Variants in these genes have been identified almost exclusively in populations of European descent, except for KCNQ1; individually, these variants confer a modest risk (odds ratio [OR] = 1.1–1.25) of developing T2D. KCNQ1 was identified as a T2D susceptibility gene in three GWA scans in Japanese individuals, highlighting the need to extend large-scale association efforts to different populations, such as Asian populations [28],[29],[40]. The association of other previously reported loci (CDKAL1, CDKN2A-2B, IGF2BP2, TCF7L2, SLC30A8, HHEX, and KCNJ11) with T2D were also replicated in the Japanese population [29],[40],[41]. To date, a GWA scan for T2D has not been conducted in the Han Chinese population, although the association of some known loci have been confirmed, including KCNQ1 and CDKAL1, CDKN2A-2B, MTNR1B, TCF7L2, HNF1β, and KCNJ11 [42]–[47]. Therefore, we conducted a two-stage GWA scan for T2D in a Han Chinese population residing in Taiwan. There were a total of 2,798 cases and 2,367 normal controls (995 cases and 894 controls in stage 1, 1,803 cases and 1,473 controls in stage 2). Our accomplished objective was to identify new diabetes susceptibility loci that were associated with increased risk of T2D in a Han Chinese population. Results Association analysis We conducted a two-stage GWAS to identify genetic variants for T2D in the Han-Chinese residing in Taiwan. In the first stage, an exploratory genome-wide scan, we genotyped 995 T2D cases and 894 population controls using the Illumina Hap550duov3 chip (Figure 1 and Table S1). For each sample genotyped in this study, the average call rate was 99.92±0.12%. After applying stringent quality control criteria, high-quality genotypes for 516,737 SNPs (92.24%) were obtained, with an average call rate of 99.92±0.24% (Table S2). The results of principal component analysis in stage 1 revealed no evidence for population stratification between T2D cases and controls (P = 0.111, Fst statistics between populations 0.03. We then genotyped the two novel SNPs and one nonsynonymous polymorphism; however, none of these SNPs showed an association with T2D (Table S6). Discussion Our GWAS for T2D in a Han Chinese population found two previously unreported susceptibility genes. All of the significant variants detected in our study showed modest effects, with an OR between 1.21 and 1.57. Two loci with less-significant associations in our primary scan (stage 1), PTPRD and KCNQ1, were selected for further replication; both showed compelling evidence of association in joint analysis. The susceptibility loci we identified in this study need to be further replicated in additional populations. Of the 18 loci previously reported to be associated with T2D (with the exception of KCNQ1), none of the P values for any of the SNPs within or near the genes reached 10−5 using allele, genotype, trend, dominant, or recessive models (Table S8; Figure S4). Three SNPs within CDKAL1, JAZF1, and HNF1B had the lowest P values, ranging from 5×10−4 to 10−5, among the 18 known loci (Table S8). No significant associations were found within these regions in our Han Chinese population. The strongest new signal was observed for rs17584499 in PTPRD. The overall Fst among 11 HapMap groups for rs17584499 was estimated to be 0.068 [52], which indicated a significant difference in allele frequencies among the populations (P 20 years, were recruited from China Medical University Hospital (CMUH), Taichung, Taiwan; Chia-Yi Christian Hospital (CYCH), Chia-Yi, Taiwan; and National Taiwan University Hospital (NTU), Taipei, Taiwan. All of the T2D cases were diagnosed according to medical records and fasting plasma glucose levels using American Diabetic Association Criteria. Subjects with type 1 diabetes, gestational diabetes, and maturity-onset diabetes of the young (MODY) were excluded from this study. For the two-stage GWAS, we genotyped 995 T2D cases and 894 controls in the first exploratory genome-wide scan (stage 1). In the replication stage (stage 2), we genotyped selected SNPs in additional samples from 1,803 T2D cases and 1,473 controls. The controls were randomly selected from the Taiwan Han Chinese Cell and Genome Bank [94]. The criteria for controls in the association study were (1) no past diagnostic history of T2D, (2) HbA1C ranging from 3.4 to 6, and (3) BMI<32. The two control groups were comparable with respect to BMI, gender, age at study, and level of HbA1C. All of the participating T2D cases and controls were of Han Chinese origin, which is the origin of 98% of the Taiwan population. Details of demographic data are shown in Table S10. Genotyping Genomic DNA was extracted from peripheral blood using the Puregene DNA isolation kit (Gentra Systems, Minneapolis, MN, USA). In stage 1, whole genome genotyping using the Illumina HumanHap550-Duo BeadChip was performed by deCODE Genetics (Reykjavík, Iceland). Genotype calling was performed using the standard procedure implemented in BeadStudio (Illumina, Inc., San Diego, CA, USA), with the default parameters suggested by the platform manufacturer. Quality control of genotype data was performed by examining several summary statistics. First, the ratio of loci with heterozygous calls on the X chromosome was calculated to double-check the subject's gender. Total successful call rate and the minor allele frequency of cases and controls were also calculated for each SNP. SNPs were excluded if they: (1) were nonpolymorphic in both cases and controls, (2) had a total call rate <95% in the cases and controls combined, (3) had a minor allele frequency <5% and a total call rate <99% in the cases and controls combined, and (4) had significant distortion from Hardy–Weinberg equilibrium in the controls (P<10−7). Genotyping validation was performed using the Sequenom iPLEX assay (Sequenom MassARRAY system; Sequenom, San Diego, CA, USA). In the replication stage (stage 2), SNPs showing significant or suggestive associations with T2D and their neighboring SNPs within the same LD block were genotyped using the Sequenom iPLEX assay. The neighboring SNPs in the same LD were selected from the HapMap Asian (CHB + JPT) group data for fine mapping the significant signal. Statistical analysis T2D association analysis was carried out to compare allele frequency and genotype distribution between cases and controls using five single-point methods for each SNP: genotype, allele, trend (Cochran–Armitage test), dominant, and recessive models. The most significant test statistic obtained from the five models was chosen. SNPs with P values less than a = 2×10−8, a cut-off for the multiple comparison adjusted by Bonferroni correction, were considered to be significantly associated with the traits. The joint analysis was conducted by combining the data from the stage 1 and 2 samples. We also applied Fisher's method to combine P values for joint analysis. The permutation test was carried out genome-wide for 106 permutations, in which the phenotypes of subjects were randomly rearranged. For better estimation of empirical P values, the top SNPs were reexamined using 108 permutations. Each permutation proceeded as follows: (1) the case and control labels were shuffled and redistributed to subjects, and (2) the test statistics of the corresponding association test was calculated based on the shuffled labels. The empirical P value was defined as the number of permutations that were at least as extreme as the original divided by the total number of permutations. Detection of possible population stratification that might influence association analysis was carried out using principle component analysis, multidimensional scaling analysis, and genomic control (Text S1). Quantile–quantile (Q–Q) plots were then used to examine P value distributions (Figure 3 and Figure S5). 10.1371/journal.pgen.1000847.g003 Figure 3 Q–Q plot for the trend test. Q–Q plots are shown for the trend test based on the 516,212 quality SNPs of the initial analysis of 995 cases and 894 controls. The red lines represent the upper and lower boundaries of the 95% confidence bands. Supporting Information Figure S1 Principle component analysis (PCA) plot. The PCA plot shows the first two principal components, estimated by EIGENSTRAT (Price et al. Nat Genet 38: 904–909), based on genotype data from 76,673 SNPs with equal spacing across the human genome. No population stratification between the 995 T2D cases (green x) and 894 controls (red +) was detected (P = 0.111, and Fst statistics between populations <0.001). (1.18 MB TIF) Click here for additional data file. Figure S2 Multidimensional scaling analysis (MDS) plot. The MDS plot shows the first two principal components, estimated by PLINK (Zheng et al. Am J Hum Genet 81:559–575), based on genotype data from 516,212 SNPs. No population stratification between the 995 T2D cases (red) and 894 controls (blue) was detected (IBS group-difference empirical P = 0.192598 for T1: case/control less similar). (0.74 MB TIF) Click here for additional data file. Figure S3 LD block between rs231361 and rs223787. (0.17 MB TIF) Click here for additional data file. Figure S4 Comparisons to susceptible regions reported by previous GWAS. For each of the (A) NOTCH2, (B) THADA, (C) PPARG, (D) IGF2BP2, (E) ADAMTS9, (F) WFS1, (G) CDKAL1, (H) JAF1, (I) SLC30A8, (J) CDKN2AB, (K) HHEX, (L) CDC123/CAMK1D, (M) TCF7L2, (N) KCNJ11, (O) MTNR1B, (P) TSPAN8/LGR5, (Q) FTO, and (R) TCF (HNF1B) regions, the −log10 P values from the primary scan are plotted as a function of genomic position (NCBI Build 36). The reported SNPs in previous GWAS are denoted by blue diamonds. Estimated recombination rates (right y-axis) based on the Chinese HapMap population are plotted to reflect the local LD structure around the significant SNPs. Gene annotations and numbers of transcripts were taken from NCBI. (4.17 MB TIF) Click here for additional data file. Figure S5 Quantile-quantile (QQ) plots. QQ plots are shown for the four association tests, (A) allelic, (B) genotype, (C) dominant, and (D) recessive, based on the 516,212 quality SNPs of the initial analysis of 995 cases and 894 controls. The upper and lower boundaries of the 95% confidence bands are represented by the red lines. (3.10 MB TIF) Click here for additional data file. Table S1 Quality control of the subject participants in stage 1. (0.03 MB DOC) Click here for additional data file. Table S2 Quality control of the genotyping results. (0.03 MB DOC) Click here for additional data file. Table S3 Association results in stage 1. (0.05 MB DOC) Click here for additional data file. Table S4 Concordance rates for the 10 SNPs with significant associations in stage 1. (0.05 MB DOC) Click here for additional data file. Table S5 Power Calculation using CaTS. (0.05 MB DOC) Click here for additional data file. Table S6 Association of additional SNPs within KCNQ1 in all T2D cases and controls in the joint analysis. (0.06 MB DOC) Click here for additional data file. Table S7 Conditional analysis on rs2237895. (0.03 MB DOC) Click here for additional data file. Table S8 Previously reported loci and SNPs associated with T2D. (0.13 MB DOC) Click here for additional data file. Table S9 Genotype frequency and allele frequency of rs17584499 (founders only) from HapMap3. (0.05 MB DOC) Click here for additional data file. Table S10 Clinical characteristics of the subjects. (0.04 MB DOC) Click here for additional data file. Text S1 Supplementary methods. (0.03 MB DOC) Click here for additional data file.