Introduction Schizophrenia is a common neuropsychiatric disorder that is characterized by positive symptoms such as delusions, paranoia and hallucinations, negative symptoms including apathy, anhedonia, and social withdrawal, and extensive cognitive impairments that may have the greatest impact on overall function ,. While current antipsychotic drug treatments control positive symptoms in most patients, negative symptoms and cognitive impairments are much less improved by these agents . A possible way to improve the treatment of schizophrenia is to identify genetic risk factors that might elucidate the underlying pathophysiological bases as well as help to subclassify patients at a molecular level in a manner helpful to therapy. The etiology of schizophrenia as presently defined is not well understood. While there are clear environmental contributors to disease –, it is clear that genetic predisposition is the major determinant of who develops schizophrenia, with heritability estimates as high as 80% ,, placing schizophrenia amongst the most heritable of the common diseases. Schizophrenia genetic research has traditionally focused on identifying linkage regions or on candidate genes and polymorphisms, such as the val158met polymorphism in the dopamine metabolizing gene COMT, or other types of variants such as VNTRs (MAOA, DAT1, SLC6A4). Such studies have implicated dozens of genes and variants, but none is generally accepted as definitively associated with schizophrenia –. It is now possible to represent the majority of common genetic variation by genotyping a selected set of tagging SNPs . Such hypothesis-free genome-wide association studies allow the discovery of new genes and pathways affecting complex traits such as schizophrenia with much greater power to detect small effects than linkage studies. To date, there have been five genome-wide association studies (GWAS) of schizophrenia. The first study used a small sample size of 178 cases and 144 controls self-identifying as Caucasian and recruited in the US, and reported the association of a SNP in the pseudoautosomal region of the y chromosome at p = 3.7×10−7 . The second used pooled DNA samples from 600 cases and 2,771 controls, all Ashkenazi Jews, and found no genome-wide significant association, although they reported a strong effect of a RELN SNP in females only . The third used pooled DNA from 574 schizophrenia trios and 605 unaffected controls, all recruited in Bulgaria and again found no genome-wide significant association . The next study of 738 cases and 733 controls (each about 30% African-American, 56% European American and 14% Other) found no evidence for the involvement of common SNPs in schizophrenia . The most recent study included 479 cases compared to 2,937 WTCCC controls and replicated the top SNPs in two further datasets respectively comprising 1,664 cases and 3,541 controls and 6,666 cases and 7,897 controls . Three of the loci remained associated after all analyses, one in ZNF804A and two in intergenic regions. While the genotyping arrays used in genome-wide association studies have very limited capacity to detect the effects of rare single site variants, large copy number variants can be readily identified using these arrays, even if they occur in only one or a few subjects. Recently, considerable attention has turned towards identifying rare copy number variants that show elevated frequencies in various human diseases using these platforms. In schizophrenia, four genome-wide screens for large CNVs have recently appeared. Two papers showed that large (>100 kb), rare deletions and duplications that disrupted genes were significantly more common in schizophrenia cases than controls ,, and that the disrupted genes in patients were disproportionately from neurodevelopmental pathways . Another showed that de novo CNVs were eight times more frequent in sporadic cases of schizophrenia than they were in familial cases or unaffected controls . While neither of these papers succeeded in identifying particular CNVs as definitive schizophrenia risk factors, the greater load of CNVs reported in cases implicate this type of genetic variant in schizophrenia. Consistent with this, Stefansson et al.  recently screened for de novo CNVs and focused on three recurrent CNVs in 4,718 patients and 41,201 controls (including, for replication purposes only, all samples investigated in this study), located at 1q21.1, 15q11.2 and 15q13.3, with odds ratios of 14.8, 2.7 and 11.5. Two of these same loci were also reported as risk factors by the International Schizophrenia Consortium (also including the Aberdeen samples used here) . These papers collectively suggest that the common disease-common variant hypothesis may be less relevant to schizophrenia than rare variants with highly penetrant effects . However it should be noted that, to date, no WGA SNP study has been well powered to detect effects of common SNPs, since they have either been performed using pooled DNA, ethnically heterogeneous samples or small samples sizes, so it is not yet possible even to rule out reasonably large effects of common SNPs in schizophrenia. Additionally, despite these strong suggestions of a role for rare highly penetrant CNVs, there has been no test of whether any common CNVs also contribute to the risk of schizophrenia. Here we investigated the effects of common SNPs, and both common and rare CNVs, on schizophrenia risk using genome-wide SNP data from the Illumina HumanHap genotyping BeadChips. Results Single Nucleotide Polymorphisms We tested for SNP associations with schizophrenia with a logistic regression model using the PLINK software  and including sex and curated EIGENSTRAT axes as covariates. An additive genetic model was tested. A series of quality control procedures were undertaken before this analysis (for details see Text S1). The results were then annotated using the WGAViewer software . No single polymorphism showed a genome-wide significant association in the discovery cohort. The top 100 associated SNPs are shown in Table 1. The most strongly associated SNPs at this stage were in the ADAMTSL3 gene (lowest p = 1.34×10−6, Table 1). 10.1371/journal.pgen.1000373.t001 Table 1 The top 100 SNPs associated with schizophrenia. Rank SNP Discovery P Replication P Combined P OR MAF Case MAF Control Chr Position SNP Type Major/ Minor alleles Closest gene Distance to gene 1 rs2135551 1.34E-06 0.03 1.35E-07 0.68 0.23 0.30 15 82498855 3PRIME UTR A/G ADAMTSL3 0 2 rs950169 2.28E-06 0.04 3.14E-07 0.68 0.23 0.30 15 82497465 NON SYNONYMOUS C/T ADAMTSL3 0 3 rs12910334 2.93E-06 0.21 2.87E-06 0.69 0.24 0.30 15 82749322 3PRIME UTR G/A Q96PW8_HUMAN 0 4 rs1911155 4.18E-06 N/A 4.18E-06 0.69 0.23 0.30 15 82578639 INTERGENIC A/G ADAMTSL3 79044 5 rs4814019 1.41E-05 0.77 9.39E-05 1.38 0.36 0.29 20 11456980 INTERGENIC A/C C20orf61 281900 6 rs4894485 1.90E-05 0.19 1.29E-05 1.37 0.43 0.35 3 176560380 INTRONIC C/T NAALADL2 0 7 rs1435194 1.92E-05 0.81 0.0001 1.44 0.25 0.19 5 105481256 INTERGENIC C/A N/A N/A 8 rs2007744 1.96E-05 0.91 0.0002 1.52 0.18 0.13 9 78044132 INTRONIC G/T PCSK5 0 9 rs2717907 2.05E-05 0.70 0.0001 0.67 0.15 0.21 7 25269991 INTERGENIC T/C NPVF −35361 10 rs715969 2.09E-05 0.52 6.28E-05 0.59 0.08 0.12 8 132543045 INTERGENIC A/C ADCY8 −419191 11 rs1146313 2.16E-05 0.88 0.0002 0.74 0.43 0.50 1 118827613 INTERGENIC C/T SPAG17 −298256 12 rs983037 2.19E-05 0.13 9.59E-06 0.73 0.32 0.39 2 12997671 INTERGENIC A/C TRIB2 197360 13 rs7175728 2.79E-05 0.55 8.55E-05 0.74 0.36 0.43 15 53681850 INTERGENIC C/T PYGO1 −13507 14 rs1229119 2.91E-05 N/A 2.91E-05 0.74 0.35 0.43 1 118825163 INTERGENIC C/T SPAG17 −295806 15 rs7943936 3.10E-05 0.29 3.61E-05 1.34 0.53 0.46 11 130214277 INTERGENIC C/T SNX19 36708 16 rs1463259 3.17E-05 N/A 3.17E-05 0.70 0.22 0.28 8 82900300 INTRONIC C/T SNX16 0 17 rs725710 3.30E-05 0.41 6.18E-05 0.65 0.11 0.16 16 73431014 INTERGENIC C/A WDR59 33961 18 rs7379110 3.41E-05 0.23 2.80E-05 0.74 0.33 0.39 5 34089378 INTRONIC C/T C1QTNF3 0 19 rs7943757 3.64E-05 0.55 0.0001 0.74 0.43 0.50 11 130221367 INTERGENIC C/T SNX19 29618 20 rs2239385 3.96E-05 1.00 0.0004 1.39 0.29 0.23 21 39949617 INTRONIC A/G B3GALT5 0 21 rs505703 4.18E-05 0.50 0.0001 1.49 0.19 0.13 4 13530080 INTERGENIC C/T FAM44A −291654 22 rs230669 4.28E-05 0.93 0.0003 0.74 0.34 0.42 11 77553034 INTERGENIC A/G KCTD21 6917 23 rs1943624 4.47E-05 0.48 0.0001 0.75 0.46 0.53 11 -9 N/A C/T N/A N/A 24 rs4772445 4.88E-05 N/A 4.88E-05 1.52 0.17 0.12 13 101601632 INTRONIC G/A FGF14 0 25 rs4714675 5.52E-05 1.00 0.0005 0.63 0.09 0.13 6 43395871 REGULATORY REGION C/T CRIP3 −11358 26 rs3748376 5.64E-05 N/A 5.64E-05 0.72 0.23 0.29 15 83129356 INTRONIC C/T ZNF592 0 27 rs6696438 5.86E-05 0.42 0.0001 0.64 0.09 0.13 1 194351115 INTERGENIC C/T KCNT2 110421 28 rs2000191 6.68E-05 N/A 6.68E-05 1.93 0.07 0.04 6 22612727 INTERGENIC A/C HDGFL1 −64930 29 rs7119425 6.71E-05 0.19 4.03E-05 0.75 0.40 0.46 11 130238395 INTERGENIC C/T SNX19 12590 30 rs7676721 6.76E-05 0.38 0.0001 1.39 0.28 0.22 4 13525449 INTERGENIC G/T FAM44A −287023 31 rs3786603 7.05E-05 0.53 0.0002 1.87 0.07 0.04 19 16596788 INTRONIC C/T MED26 0 32 rs1586030 7.07E-05 N/A 7.07E-05 0.75 0.41 0.47 8 3496385 INTERGENIC C/T CSMD1 −237389 33 rs4745431 7.26E-05 0.02 3.83E-06 0.75 0.40 0.45 9 77461915 INTERGENIC C/T PCSK5 −233491 34 rs9375543 7.75E-05 0.52 0.0002 0.75 0.37 0.44 6 128352833 INTRONIC T/C PTPRK 0 35 rs2631879 8.26E-05 0.16 3.81E-05 1.55 0.15 0.10 8 21174982 INTERGENIC C/T GFRA2 418830 36 rs12365680 8.65E-05 0.88 0.0005 0.76 0.38 0.44 11 130254269 INTRONIC G/A SNX19 0 37 rs6737733 8.91E-05 0.94 0.0006 0.74 0.28 0.34 2 118291938 INTRONIC C/T DDX18 0 38 rs2216670 0.0001 0.72 0.0004 1.81 0.07 0.04 19 16555094 INTRONIC T/G MED26 0 39 rs154981 0.0001 N/A 1.00E-04 1.32 0.53 0.46 6 32988971 INTERGENIC T/C NM 002118 21413 40 rs566353 0.0001 N/A 1.00E-04 1.32 0.48 0.43 11 60472288 INTRONIC C/T SLC15A3 0 41 rs3021461 0.0001 0.28 9.40E-05 0.71 0.19 0.24 3 129578342 INTRONIC A/G EEFSEC 0 42 rs2844511 0.0001 N/A 1.00E-04 0.76 0.42 0.50 6 31497763 INTERGENIC C/T Q5SS58_HUMAN 5768 43 rs9402011 0.0001 N/A 1.00E-04 0.73 0.25 0.31 6 128347581 INTRONIC T/C PTPRK 0 44 rs4280783 0.0001 N/A 1.00E-04 0.76 0.38 0.44 4 67771960 INTERGENIC C/T CENPC1 248624 45 rs2304066 0.0001 N/A 1.00E-04 1.39 0.24 0.19 5 145760885 INTERGENIC T/C TCERG1 −46181 46 rs1379552 0.0001 N/A 1.00E-04 0.72 0.20 0.25 5 111007057 INTERGENIC T/C C5orf13 85351 47 rs12680924 0.0001 0.70 0.0004 1.31 0.48 0.43 8 15098795 INTRONIC G/A SGCZ 0 48 rs1328657 0.0001 N/A 1.00E-04 1.33 0.32 0.27 13 96981887 INTERGENIC C/T RAP2A 63644 49 rs718796 0.0001 N/A 1.00E-04 1.75 0.08 0.05 7 16585507 INTRONIC A/G ANKMY2 20425 50 rs17257972 0.0001 0.67 0.0003 1.31 0.52 0.46 1 118908135 INTERGENIC G/A TBX15 319054 51 rs4669887 0.0001 0.48 0.0002 1.38 0.26 0.20 2 12893630 INTERGENIC G/T TRIB2 93319 52 rs6868716 0.0001 N/A 1.00E-04 1.54 0.12 0.09 5 145769452 INTERGENIC C/T TCERG1 −37614 53 rs7762279 0.0001 N/A 1.00E-04 0.63 0.08 0.12 6 32863268 INTERGENIC T/C A2ADX3_HUMAN −23981 54 rs596958 0.0001 0.40 0.0002 1.81 0.07 0.04 21 39945361 INTRONIC G/A B3GALT5 0 55 rs764855 0.0001 N/A 1.00E-04 1.31 0.46 0.40 11 56366988 INTERGENIC G/T Q3C1V7_HUMAN −23228 56 rs7815272 0.0001 N/A 1.00E-04 0.73 0.23 0.29 8 82854364 INTERGENIC G/T SNX16 20013 57 rs7775397 0.0002 N/A 2.00E-04 0.64 0.09 0.13 6 32369230 NON SYNONYMOUS T/G C6orf10 0 58 rs2523554 0.0002 N/A 2.00E-04 0.76 0.36 0.42 6 31439808 INTERGENIC T/C HLA-B −6894 59 rs1766803 0.0002 N/A 2.00E-04 0.76 0.35 0.41 1 119185341 INTERGENIC T/C TBX15 41848 60 rs3934902 0.0002 0.12 6.26E-05 0.72 0.17 0.23 9 85367221 INTERGENIC G/A FRMD3 −23940 61 rs1217461 0.0002 N/A 2.00E-04 0.73 0.20 0.24 5 104241987 INTERGENIC T/G N/A N/A 62 rs1289726 0.0002 N/A 2.00E-04 0.63 0.07 0.11 1 162679471 INTERGENIC G/T PBX1 −116213 63 rs4329498 0.0002 N/A 2.00E-04 1.30 0.52 0.46 1 118935132 INTERGENIC T/C TBX15 292057 64 rs9427727 0.0002 0.18 9.77E-05 1.32 0.40 0.34 1 200305568 INTERGENIC C/T GPR37L1 −53084 65 rs1054869 0.0002 0.12 6.34E-05 0.77 0.43 0.49 11 130247840 DOWNSTREAM G/A SNX19 3145 66 rs7758512 0.0002 N/A 2.00E-04 1.54 0.13 0.09 6 30078568 INTRONIC T/G Q6ZU40_HUMAN 0 67 rs6924102 0.0002 N/A 2.00E-04 0.76 0.40 0.47 6 32919361 INTRONIC A/G PSMB8 0 68 rs4236064 0.0002 N/A 2.00E-04 1.41 0.21 0.16 6 39522005 INTRONIC G/T KIF6 0 69 rs4487082 0.0002 0.01 5.70E-06 0.59 0.06 0.09 2 229432205 INTERGENIC A/G PID1 164729 70 rs2119137 0.0002 N/A 2.00E-04 1.31 0.40 0.34 2 174725960 INTRONIC A/G OLA1 0 71 rs39829 0.0002 N/A 2.00E-04 0.76 0.39 0.44 5 13772997 INTRONIC A/C DNAH5 0 72 rs11635597 0.0002 N/A 2.00E-04 0.74 0.24 0.29 15 82966703 3PRIME UTR C/T ZSCAN2 0 73 rs16878312 0.0002 N/A 2.00E-04 0.75 0.28 0.34 5 71741140 INTERGENIC C/A ZNF366 33850 74 rs7603333 0.0002 N/A 2.00E-04 1.30 0.49 0.43 2 79211103 INTERGENIC G/A Q53S57_HUMAN 6855 75 rs2631878 0.0002 0.39 0.0003 1.52 0.14 0.10 8 21173339 INTERGENIC G/A GFRA2 420473 76 rs4745430 0.0002 0.01 8.30E-06 0.76 0.33 0.39 9 77461845 INTERGENIC T/C PCSK5 −233561 77 rs4901053 0.0002 N/A 2.00E-04 0.77 0.46 0.52 14 50287324 INTRONIC T/G NIN 0 78 rs5756219 0.0002 N/A 2.00E-04 1.30 0.49 0.41 22 35221804 INTRONIC A/C EIF3D 0 79 rs220420 0.0002 N/A 2.00E-04 0.76 0.33 0.38 6 86918598 INTERGENIC G/T C6orf161 −416093 80 rs1294028 0.0002 0.69 0.0006 1.31 0.42 0.36 1 9287221 INTRONIC G/A SPSB1 0 81 rs7738388 0.0002 N/A 2.00E-04 1.34 0.32 0.27 6 141494838 INTERGENIC T/C N/A N/A 82 rs12966353 0.0002 0.56 0.0005 1.31 0.39 0.33 18 24297001 INTERGENIC C/A CDH2 −285812 83 rs2236711 0.0002 N/A 2.00E-04 0.77 0.43 0.49 11 130244494 INTERGENIC G/A SNX19 6491 84 rs927743 0.0002 0.22 0.0001 1.31 0.46 0.39 1 59141535 INTERGENIC G/T JUN −118948 85 rs2073848 0.0002 N/A 2.00E-04 1.43 0.19 0.14 11 113557114 INTRONIC T/C ZBTB16 0 86 rs6457374 0.0002 N/A 2.00E-04 0.74 0.24 0.30 6 31380240 INTERGENIC T/C 1C07_HUMAN −32354 87 rs2372897 0.0002 N/A 2.00E-04 0.76 0.36 0.43 11 77420474 INTERGENIC A/G KCTD14 −8486 88 rs2071538 0.0002 N/A 2.00E-04 1.37 0.26 0.20 6 32926656 INTRONIC G/A TAP1 0 89 rs1482294 0.0002 N/A 2.00E-04 1.30 0.46 0.39 12 128276744 INTRONIC G/A TMEM132D 0 90 rs3099844 0.0002 N/A 2.00E-04 0.67 0.10 0.14 6 31556955 INTERGENIC G/T HCP5 15495 91 rs1794282 0.0002 N/A 2.00E-04 0.65 0.08 0.13 6 32774504 INTERGENIC C/T HB25_HUMAN −20208 92 rs8023192 0.0002 N/A 2.00E-04 1.30 0.49 0.44 14 20265311 INTERGENIC C/A FAM12A −18628 93 rs1451487 0.0002 N/A 2.00E-04 0.71 0.15 0.19 2 199703113 INTERGENIC T/C SATB2 139355 94 rs10516269 0.0002 N/A 2.00E-04 1.41 0.20 0.14 4 13546358 INTERGENIC T/G FAM44A −307932 95 rs30168 0.0003 N/A 3.00E-04 0.77 0.38 0.44 5 13772089 NON SYNONYMOUS G/A DNAH5 0 96 rs4800613 0.0003 N/A 3.00E-04 0.76 0.29 0.34 18 20861721 INTERGENIC A/G ZNF521 34168 97 rs3903663 0.0003 N/A 3.00E-04 0.74 0.25 0.30 6 128354781 INTRONIC A/G PTPRK 0 98 rs4658504 0.0003 0.70 0.0009 0.77 0.34 0.40 1 241107670 INTERGENIC C/T CEP170 246683 99 rs3131379 0.0003 N/A 3.00E-04 0.66 0.09 0.13 6 31829012 INTRONIC G/A MSH5 0 100 rs1377347 0.0003 N/A 3.00E-04 1.33 0.33 0.27 4 41067359 INTRONIC A/C LIMCH1 0 Displayed are: (1) rs# for the top 100 SNPs; (2) combined p value in Munich and Aberdeen discovery cohorts; (3) p value in the second independent cohort; (4) Combined p value in discovery and replication cohorts using Stouffer's method ; (5) odds ratio; (6) minor allele frequency in patients; (7) minor allele frequency in controls; (8) chromosome; (9) chromosomal position; (10) a description of the relative position of the SNP in the closest gene; (11) major and minor alleles; (12) the symbol of the closest gene; and (13) distance to the closest gene. Following these analyses, we genotyped the top 100 polymorphisms in a further independent Munich cohort of 298 schizophrenic patients and 713 healthy controls, all self-identifying as of German or central European ancestry. Using the Sequenom iPLEX system, we successfully genotyped 98 of the 100 SNPs and found that 8 of these 98 variants showed an association that was significant at the 0.05 level in the independent cohort (rs2135551, rs950169, rs1911155, rs4745431, rs4745430, rs4487082, rs3748376 and rs11635597). These included the most strongly associated three SNPs in the list: rs2135551, rs950169 and rs1911155 in ADAMTSL3 (in linkage disequilibrium with one another) (Table 1). Since 3 of these 8 SNPs are in strong LD, this is approximately the number of significant associations we would expect by chance at p 2 Mb in healthy samples tested in an extensive battery of cognitive tests, we searched for such events in a set of 1,547 ethnically-mixed cognitively normal healthy controls. We were unable to find any sample with a deletion >2 Mb - the largest was 1.5 Mb. This indicates that deletions greater than 2 Mb are very rare ( 500 kb (Heinzen et al., in preparation), suggesting that it is a risk factor for other neuropsychiatric conditions as well as schizophrenia. Collectively, these data strongly suggest that deletions in this regions contribute to schizophrenia and epilepsy, providing another example of a CNV influencing different neuropsychiatric conditions , and supporting the observation that deletions greater than 2 Mb are likely to be disease-associated. 10.1371/journal.pgen.1000373.g001 Figure 1 Novel >2 Mb deletions found in schizophrenia cases. Duplications and smaller deletions in region not shown. The chromosome 8 region is deleted in a single Munich patient. The chromosome 8 region is deleted in a patient from Aberdeen and has overlapping, smaller deletions in a patient from Munich and an African American patient. (Adapted from UCSC browser: http://genome.ucsc.edu). The second newly-reported very large deletion was in the Munich cohort, spanning 3.25 Mb on 8p22 and includes a number of promising candidate genes (Table 4). Inspection of the region in other samples did not provide further support. We also inspected each of the large duplications, although these were not unique to cases in our analyses. This suggests that large duplications can be compatible with normal cognitive function, making it more difficult to suggest causality to any of the large duplications in the cases. However, some of the duplications were of interest nevertheless. In the US cohort, we have one patient who has a 9.4 Mb duplication (reported as 9.04 by PennCNV, Table 4) on chromosome 15q11.2-13.3, which extends across the Prader-Willi/Angelman syndrome critical locus, a region known to contain many segmental duplications and inverted repeats  (the patient does not suffer from either Prader-Willi or Angelman syndrome). This overlaps two other large duplications - a 5 Mb duplication in a Munich case (Table 4), and a de novo 1.5 Mb duplication involving APBA2 previously reported in a schizophrenia patient . Additionally, it overlaps with a previously reported schizophrenia-associated deletion event at chr15:28.72–30.30 ,, and we also observed a 1.16 Mb deletion in this region in a US patient. This evidence confirms a role for recurrent mutation in this region in schizophrenia susceptibility and indicates that duplications as well as deletions of this region can lead to schizophrenia. Additionally, since no duplication greater than 3 Mb was found in any control subject, there is evidence that duplications of this size are detrimental in general. Burden of rare CNVs greater than 100 kb Next, we investigated general CNV load between cases and controls. It was recently shown that schizophrenia patients were more likely to have rare CNVs greater than 100 kb that disrupted genes (that is, began or ended within a gene) or that deleted or duplicated entire genes ,, although this was not replicated in a Chinese population . Following Walsh et al. , we selected all CNVs greater than 100 kb that had not been previously reported in the DGV and compared the frequencies of those that did and did not affect genes between cases and controls, separating cases and controls, and deletions and duplications. Since our CNVs are identified based on SNP data, we are not able to precisely determine where each event begins and ends. Therefore, we did not attempt to distinguish between “disrupted” (Walsh et al. define this as a gene that is interrupted by a CNV ) and “included” genes (genes completely encompassed by a CNV) - instead we counted all gene-including and apparently gene-disrupting CNVs together as “gene-affecting”. First looking at the Aberdeen cohort, which has been included in a previous study replicating the Walsh et al. effect  we found that 91 of the 441 Aberdeen cases that passed QC (21%) contained one or more rare deletions >100 kb that affected a gene, compared to 66/439 controls (15%), and that 61/441 (14%) cases contained rare, greater than 100 kb, gene-affecting duplications, compared to 49/439 (11%) of controls. Fisher's exact test indicated that this was a significant excess of deletions in cases (p = 0.03), but only a trend for duplications (p = 0.26). In contrast, neither the Munich nor the US cohort (which have not been assessed in previous publications) showed an excess of deletions in cases (Table 5), although the Munich cohort had significantly more duplications in cases (p = 0.03) and US cohort showed a trend in the same direction for deletions (p = 0.08). 10.1371/journal.pgen.1000373.t005 Table 5 Count of European-ancestry samples with one or more rare gene-affecting CNV that is greater than 100 kb and includes 20 or more SNPs. Aberdeen Munich Meltzer/memory deletion duplication deletion duplication deletion duplication case control case control case control case control case control case control Has 1 or more CNV that affects a gene 91 66 61 49 30 29 61 36 13 11 20 43 Total 441 439 441 439 422 381 422 381 161 267 150 264 Fisher's 1-tailed p value 0.034 0.112 0.788 0.030 0.079 0.897 It was also previously reported that there were no differences between cases and controls for rare CNVs greater than 100 kb that did not disrupt genes , however the International SNP Consortium found an excess of rare CNVs greater than 100 kb that do not disrupt genes in schizophrenia cases . It should be noted that this dataset also included the Aberdeen samples. Our findings here, however, were similar to those of the gene-affecting CNVs: Aberdeen had significantly more rare deletions greater than 100 kb that did not affect a gene (two-tailed Fisher's p = 0.031), and no other comparisons were significantly different (Aberdeen duplications, Munich and US duplications and deletions). Overall, therefore, we cannot offer further support to the hypothesis that rare CNVs greater than 100 kb are present in excess in schizophrenia patients, although, as shown above, we report a trend for increased deletions greater than 1 Mb, and a significant excess of greater then 2 Mb deletions in cases. The population used by Walsh et al. contained a large number of young-onset and childhood-onset schizophrenia patients , a population one might expect to be enriched for genetic rather than environmental contributors. Like the Walsh et al., cohort, the Aberdeen schizophrenia cohort seems to contain an unusually large number of copy number variants both in comparison to the Aberdeen controls and in comparison to the other schizophrenia cohorts. However, the Aberdeen cohort was not enriched for young-onset patients, nor was it in any other obvious way different from the Munich patient cohort. The patients from both regions were selected using a consistent clinical protocol and the distribution of schizophrenia subtypes were similar. Further examination of population differences with those that do and do not carry an excess burden of large rare CNVs will be necessary to elucidate the differences between cohorts in this respect. Additionally, these differences seen in this study may in part depend on the type of platform used to detect the CNVs. The Aberdeen, Munich, and US cohorts were each genotyped using a different genotyping platform (Illumina HumanHap550, HumanHap300, Human-610 Quad respectively), using only partially overlapping SNP sets. This could influence the CNV calls, although it is difficult to see why the HumanHap550 platform would detect an excess of CNVs in comparison to the Human-610 Quad chip which is designed to have improved CNV detection. Common CNVs in schizophrenia We then tested whether any of the more common CNVs were significantly different in frequency between cases and controls, using Fisher's exact test. This has not been investigated in any of these samples previously. In this case common is a relative term rather than defined by a particular frequency cut-off, since a common CNV was defined as one that was present in three or more individuals in any particular cohort. We did not attempt to classify different CNVs as identical or different, as use of the different BeadChips can make the beginning and endpoints of the CNVs unclear. Additionally, the actual length of the CNV may be unimportant if it covers the same critical region as a shorter or longer CNV. We therefore performed this analysis by determining copy number for each SNP on the Beadchip, and performing Fisher's Exact test for each SNP included in a CNV. Low confidence CNVs, those in some telomeric and centromeric regions and those coding for immunoglobulin genes were removed (see Methods) before beginning the analysis, since these are particularly susceptible to false positive calls. We first tested the Aberdeen, Munich and US samples separately, and also separately compared a) deletions, b) duplications, and c) genomic regions affected by both types of CNV. The Munich cohort had 1,299 SNPs affected by deletions (with frequency ranging from 0.4% to 15%), 1,042 SNPs affected by duplications (0.04%–9.3%) and 202 SNPs affected by both deletions and duplications (in different subjects; frequency 0.5%–9.4%). The equivalent values for Aberdeen were 3,879 deletions (0.34%–18.9%), 2,634 duplications (0.34%–17.5%) and 1,016 SNPs affected by both (0.45%–62%) and for the US cohort, 2,702 deletions (0.72%–46.7%), 3,159 duplications (0.72%–21%) and 1,399 SNPs affected by both (1.0%–58%). It should be noted that these CNVs have not been individually validated either by inspection in BeadStudio or by any experimental means, and many of these are likely to be false positives. The three cohorts differed both in the CNVs that most strongly associated and in the direction of the effects. There were no events that were significantly associated (p 2 Mb) The extra 1,547 controls were also part of the Genetics of Memory/ Genetics of Epilepsy studied at Duke and genotyped in the same facility using the Illumina Infinium HumanHap 550K, and subject to identical quality control procedures. They comprised healthy controls who performed normally in a series of cognitive tests (age range = 18–85, mean = 25.5, median = 22). The majority were of European origin but also included were approximately 10% each of African-American, East Asian (mostly Chinese) and South Asian (mostly Indian) as well as 5% Hispanic. Genotyping and Quality Control The Munich cohort was genotyped using the Illumina HumanHap300 chip with a total of 317,503 SNPs and the Aberdeen cohort was genotyped using the Illumina HumanHap550 chip with a total of 555,352 SNPs. We carried out a series of quality control (QC) checks and tests of cryptic relatedness, ultimately excluding a total of 15 and 28 participants in Munich and Aberdeen respectively (Text S1). We also employed a “one percent rule” that discarded from analysis any SNP that had more than 1% of samples that could not be reliably scored, to reduce the scope for spurious association. After employing this rule the average success rate of genotyping was 98.4% and the concordance rate for duplicate genotyping was 99.997%. The US cohort was genotyped using the Human-610 Quad Beadchip at the Institute for Genome Sciences and Policy Genotyping Core, and the same quality control procedures were applied as those used for the discovery cohorts. Association Analyses and Correction for Population Structure Our core association analyses to identify schizophrenia risk factors focused on single-marker tests of the 312,565 QC-passed SNPs that were genotyped in both cohorts. To control for the possibility of spurious associations resulting from population stratification we used the EIGENSTRAT approach of Price et al . This method derives the principal components of the correlations among gene variants and corrects for those correlations in the association tests. In principle, therefore the principal components in the analyses should reflect population ancestry. We have noticed however that some of the leading axes appear to depend on other sources of correlation, such as sets of variants near one another that show extended association. We have documented the potential for inversions to create this effect and it may be created by other causes of extended linkage disequilibrium as well (Text S1). For this reason we inspected the SNP ‘loadings’ for each of the leading axes to determine if they depended on many or relatively few SNPs, as would be expected if the given axis reflected population ancestry or a more localized linkage disequilibrium effect respectively. This analysis identified several axes clearly due to inversions and suggested that four axes should be retained for ancestry adjustment (Text S1). We therefore assessed significance using four principal components emerging from the EIGENSTRAT analyses as covariates in a logistic regression model which also incorporated sex as a covariate and combined samples from Munich and Aberdeen (a division which clearly drove the first EIGENSTRAT axis). Bayesian Analysis of Posterior Odds of Association Following Wakefield , we found the estimated log-odds for association, , under a multiplicative genetic model for rs2135551, together with its estimated variance V, from standard logistic regression of each dataset. Given a prior odds of PO for the association being true, and a prior distribution of ∼N(μ,W) for θ under the hypothesis of true association, we found the posterior odds having observed new data at each stage as , and updated the posterior distribution of θ under the hypothesis of true association as . We then entered these posteriors as priors into the analysis of the next set of data. To start, we set PO = 1/100000 following the Wellcome Trust Case Control Consortium  (i.e., assuming a million independent regions of the genome and 10 detectible causal loci for schizophrenia), and following Wakefield, 2007  we set μ = 0 and W = (log(1.5)/1.96)2 (i.e., assuming that 95% of all casual effects fall between 2/3 and 3/2 per allele under a multiplicative genetic model). Alternative Splicing Alternative transcripts were identified searching ExonHit Therapeutics SpliceArray portal (http://portal.splicearray.com) and blasting exon-intron boundary sequences against human cDNA libraries. For semi-quantitative evaluation of transcript ratio differences, primers flanking the common 5′ splice donor site in exon 29 (forward primer: 5′-TTGGGCCCTCCTGTGATA-3′, location shown in Figure S1A) and the alternative 3′ splice acceptor site in exon 30 (reverse primer: 5′-TGGCAGCACCTTTGTTTGTA-3′, location shown in Figure S1A) were used to simultaneously amplify all four transcript forms (Figure S1A). The fragments were separated on a 3.5% NuSieve agarose gel and direct sequencing was used to confirm expected transcript forms. Taqman-based real time PCR was used to quantitatively determine ratios of alternative transcripts in human brain tissue. Assays were custom designed through Applied Biosystems by targeting unique exon-exon boundaries (for primer and probe sequences see Text S1). β-actin mRNA expression level was quantified using a commercially available Taqman assay (Applied Biosystems). Fluorescence outputs were quantified in real time using a 7900HT Fast Real Time PCR System and the data were analyzed using SDS software v.2.2.2 (Applied Biosystems). One way analysis of variance was used to determine the correlation of alternative transcript abundance with the rs950169 and rs2135551 genotypes in human brain tissue. Statistical analyses were performed both separately in control and Alzheimer's disease prefrontal cortex samples, and as a combined subject analysis. A genomic DNA fragment of 4028 bp from the ADAMTSL3 gene that included exons 29 and 30 with flanking intron sequences was PCR-amplified from a reference genomic DNA using the following primers: gggaattcAAGGGCAGATACCCCAAAGT and taggatccCGCTTGCTCTTCCAACTACC. Subsequently, the PCR fragment was subcloned into pSPL3 (GibcoBRL) as a minigene. The minor allele of rs950169 was generated in the minigene by mutagenesis (QuikChange Mutagenesis kit, Stratagene) and the sequences were confirmed by DNA sequencing. The minigenes were transfected into HEK293 cells using Lipofectamine2000 (Invitrogen). After the 48 h transfection, RNA was extracted using RNeasy kit (Qiagen) and converted into cDNA using High-Capacity cDNA Archive Kit (Applied Biosystems). Alternative splicing of exon 29 and exon 30 was detected by Taqman assays and agarose gel. Power Calculations Following Chapman et al. , we assumed that the test statistic from a case-control trend test of association follows a non-central chi-square distribution with 1 d.f. and non-centrality parameter η = (n−1)r 2 H, where n is the sample size, r 2 is the LD between the causal SNP and it's tag SNP on the GWAS genotyping panel, and H is the proportion of variation explained by the SNP if it were typed directly. In a case-control setting, , where Π is the proportion of cases in the total sample, p is the frequency of causal alleles in controls and in the general population (assuming a rare disease), p′ is the frequency of causal alleles in cases ( where θ is the allelic relative risk or odds ratio assuming a rare disease), and is the causal allele frequency in the study as a whole. We simulated sets of 100,000 X 1 values from a Normal distribution with mean = √η 1 and variance = 1, where η 1 is the presumed non-centrality parameter from the GWAS study (n = 1734, Π = 0.506), and an additional set of 100,000 X 2 values from a Normal distribution with mean = √η 2 and variance = 1, where η 2 is the presumed non-centrality parameter from the first replication study (n = 1011, Π = 0.295). To score a “hit”, we required both that X 1 exceeded the upper critical value for a two-tailed test at α = 0.0003 (to mimic being passed to the 1st replication stage), and that both X 1 and X 2 had the same sign and had a joint P 0.28, BAF median>0.55 or 0.002 or WF>0.04 or A, indicated by a red star) in the reference splice donor site (in the plus five position) which influences the usage of the reference donor site (data not shown). This variant showed a frequency of less than one percent in our cohort and was therefore too rare to properly assess any possible contribution of this new splicing variant to schizophrenia risk, although a protective trend was observed (data not shown). The black arrows represent location of primers used for semi-quantitative evaluation of alternative transcript ratios. Schema is not to scale. B. The amino acid sequence of ADAMTSL3 protein PLAC domain and predicted effect of alternative splicing of exons 29 and 30. PLAC domain characteristic cysteines are highlighted. C. Evidence of the presence of four alternative ADAMTSL3 transcripts in human brain tissue samples with different ADAMTSL3 rs950169 and IVS29+5G>A genotypes. Lane 1: 100 bp ladder; Lanes 2 and 3: rs950169: CC, IVS29+5G>A: GG; Lanes 4 and 6: rs950169: CT, IVS29+5G>A: GG; Lanes 5 and 7: rs950169: TT, IVS29+5G>A: GG; Lanes 8 and 9: rs950169: CT, IVS29+5G>A: GA. (1.85 MB TIF) Click here for additional data file. Figure S2 Evidence of the genetic control of ADAMTSL3 exon 30 alternative splicing in human brain tissue. Correlation of the ADAMTSL3 rs950169 genotype with the relative abundance of transcripts containing shorter exon 30 due to usage of alternative splice acceptor site (RSD-ASA) and full reference exon 30 (RSD-RSA) in the prefrontal cortex of control (•) and Alzheimer's disease brain tissue (◊). Bars indicate means of combined controls and Alzheimer's disease patients transcript abundance values. p<0.0001, ANOVA, combined controls and Alzheimer's disease patients p<0.0001, ANOVA, separate analyses of controls and Alzheimer's disease patients. (0.48 MB TIF) Click here for additional data file. Figure S3 Evidence of the genetic control of ADAMTSL3 exon 30 alternative splicing in human brain tissue. Correlation of the ADAMTSL3 rs950169 genotype with the relative abundance of transcripts in the prefrontal cortex of control (•) and Alzheimer's disease brain tissue (◊). Bars indicate the means of combined controls and Alzheimer's disease patients transcript abundance values. A. Effect on alternative splice acceptor site in relation to alternative splice donor site. B. Effect on alternative splice donor site in relation to alternative splice acceptor site. C. No effect on alternative splice donor site in relation to reference splice acceptor site. (0.79 MB TIF) Click here for additional data file. Figure S4 The expression of ADAMTSL3 in the mouse forebrain as depicted in the Allen Brain Atlas. Highlighted is high expression in the pyramidal cell layer of the hippocampal formation (including CA1 and CA3 regions). We used information provided by the Allen Mouse Brain Atlas (http://www.brain-map.org) to determine the likelihood that a gene classified as showing expression in the brain at some point in development would show the same pattern of expression in the mouse brain as found in ADAMTSL3. Only 1.4% of all such genes (893/20598) showed clustered expression in the hippocampus. (7.32 MB TIF) Click here for additional data file. Table S1 Association results in this dataset presented for the Aberdeen cohort for loci implicated in O'Donovan et al. . (0.04 MB DOC) Click here for additional data file. Table S2 Association results in this dataset for SNPs previously implicated in schizophrenia GWAS studies. (0.15 MB DOC) Click here for additional data file. Table S3 Pathway analysis for genes disrupted by large rare copy number variations in schizophrenia patients and controls. (0.12 MB DOC) Click here for additional data file. Text S1 Supplementary methods and results. (0.09 MB PDF) Click here for additional data file.