Introduction Genome-wide association (GWA) studies have, during the past decade, become a powerful tool to study the genetic components of complex diseases [1]. Although an increasing number of genes/markers have been uncovered in GWA studies, which have provided us important insights into the underlying genetic basis of complex diseases such as schizophrenia [2], [3], [4], it has also become evident that many genes are weakly or moderately associated with the diseases. Most of these variants have been missed in single marker analysis, as investigators typically employ a genome-wide significance cutoff P-value of 5×10−8. Alternatively, the gene set analysis (GSA) of GWAS datasets provides ways to simultaneously examine groups of functionally related genes for their combined effects and thus have improved power and interpretability [5]. Many GSA methods have been reported to date, such as the gene set enrichment analysis [6], the adaptive rank-truncated product [7], the gene set ridge regression in association studies (GRASS) [8], etc. Most of these methods were designed to use pre-defined gene sets such as the KEGG database [9] or the Gene Ontology (GO) annotations [10]. Alternatively, studies are emerging by incorporating protein-protein interaction (PPI) networks into GWAS analysis. Baranzini et al. [11] first adopted a network-based method that was initially designed for gene expression data [12] to analyze the GWAS data for multiple sclerosis. Recently, Rossin et al. [13] developed the Disease Association Protein-Protein Link Evaluator (DAPPLE); it tests whether genes that are located at association loci in a GWAS dataset are significantly connected via PPIs. We have also developed the dense module search (DMS) method [14], which overlays the gene-wise P values from GWAS onto the PPI network and dynamically searches for subnetworks that are significantly enriched with the association signals. The advantages of network-based analysis of GWAS data in comparison with the standard GSA methods lie in many aspects. First, most GSA methods test on pre-defined gene sets, which heavily rely on a priori knowledge and are incomplete. For example, the popular KEGG database has pathway annotations covering only ∼5,000–5,500 genes [15], accounting for less than 30% of the genes in GWAS datasets. In contrast, the annotations of PPI data cover a much larger proportion of human proteins. For example, a recent integrative analysis of PPI data from multiple sources has reconstructed the human PPI network by recruiting ∼12,000 proteins and ∼60,000 protein interaction pairs with experimental evidence [16]. There are other assembled PPI datasets that include both experimentally supported and computationally predicted interactions; thus, they could annotate even more proteins and interactions [17], [18]. Second, the standard GSA methods are typically based on canonical definitions of pathways or functional categories, but the association signals from GWAS might converge on only part of the pathway [19]. In such cases, analysis of the whole pathway as a unit would reduce the power. On the other hand, network-assisted methods allow for the definition of de novo gene sets by dynamically searching for interconnected subnetworks in the whole interactome and, thus, can effectively alleviate the limitation of the fixed size in pathway analysis. Despite these advantages, there are challenges in the application of network-based approaches to GWAS data. For example, the methods for defining or searching subnetworks vary greatly. While it is impractical to examine all possible subnetworks due to the intensive computing burden, different methods or algorithms may identify substantially different subnetworks [20], making it difficult to decide in real application. Additionally, network-based analysis could be confounded by nodes with high degrees (i.e., the number of interactors of each node in the network), although these nodes constitute only a small proportion according to the framework of power-law distribution [21]. One example is TP53, which interacts with several hundreds of other proteins in the whole PPI network. The existence of such hub nodes with strong interaction in the network may help them more likely to be selected in searching subnetworks and, thus, overwhelm the resultant subnetworks. Appropriate adjustments are needed. In this study, we aim to search for modules that are significantly enriched with association signals in human PPI network weighted by GWAS signals. We take advantage of our recently developed dense module search (DMS) algorithm [14] to conduct module searching and construction. Based on this, we introduced statistical evaluations of the modules identified by DMS, including a significance test based on module scores, a weighted resampling method to adjust potential biased in GWAS data (e.g., caused by gene length or SNP density), a topologically matched randomization process to adjust the bias in network (e.g., the high degree nodes), and a permutation test to determine the disease association of the modules. In addition, we propose a bi-directional framework to search for consistent association signals from multiple GWAS datasets available for one specific disease or trait. Specifically, two GWAS datasets were analyzed in parallel: one is assigned as a discovery dataset and another as an evaluation dataset, and vice versa. This strategy provides robust results with partial validation — only the modules that were consistently highly scored would be selected for further validation and functional assessment. We demonstrated it in schizophrenia using two major GWAS datasets for module identification, and incorporated a third dataset to independently replicate the results. Finally, we performed a meta-analysis of the markers that were mapped in the module genes. We identified 18 SNPs in 9 module genes that are of particular interests (P meta 5%), extreme heterozygosity rate (±3 s.d. from the mean value of the distribution), or problematic gender assignment. We used PLINK [28] to compute the identify-by-state (IBS) matrix to pinpoint duplicate or cryptic relationships between individuals, and we retained the sample with the highest call rate for each pair of samples with an identity-by-descent (IBD)>0.185. Principle component analysis (PCA) was performed using the smartpca program in EIGENSTRAT [29] to detect population structure and to allow removal of outlier individuals. Eight significant PCs with the Tracy Widom test P value 5%, minor allele frequency (MAF) Zm ×(1+r), where Zm+1 is the new module score after adding the node, Zm is the original module score and r is a pre-defined rate. We set r to be 0.1 in this study. This module expansion process iterates until none of the neighborhood nodes can satisfy the function Zm+1 >Zm ×(1+r). Because this module construction process was conducted taking each node in the network as the seed gene, several thousands of modules are expected corresponding to the thousands of nodes. Module assessment We provided three procedures to assess the significance of the identified modules, each of which aims to build null distributions for different hypotheses. First, to perform significance test of the identified modules, we calculated P values based on module scores (Zm ) for each module by empirically estimating the null distribution [26]. According to Efron et al. (2010), the null distribution is a normal distribution with mean δ and standard deviation σ, both of which can be empirically estimated using the R package locfdr. Specifically, module scores were first median-centered by subtracting the median value of Zm from each of them, followed by estimation of the parameters of δ and σ for the empirical null distribution using locfdr. The standardized module scores (ZS ) were then calculated and converted to P values, P(Zm ) = 1-Φ(ZS ), where Φ is the normal cumulative density function. Second, to determine whether the module score is higher than expected by chance, a standard way is to randomly select the same number of genes in a module, i.e., resample genes in the network regardless of the interactions, and compare the module score in the random gene set with the score in the real case. Specifically to alleviate the biases in GWAS data (e.g., gene length or SNP density) or the network data (e.g., high-degree nodes), we incorporated weighted resampling which intentionally matches the pattern of biases in each resample to resemble the real case. The gene length bias and the SNP density bias are commonly noticed in GWAS datasets, especially when using the most significant SNP to represent genes [30]. This is because when mapping SNPs to genes, longer genes tend to have more SNPs and in turn have higher chance to be significant. These two types of biases are closely correlated but differ in cases due to different genotyping platforms. For both biases, we first estimate a weight for each gene based on the specific character to be adjusted, and then performed weighted resampling to ensure each of the resample has the similar pattern in term of the adjusted character. This weighted resampling procedure ensures that genes could be selected in a similar pattern of gene length or SNP density as in the real GWAS data. Therefore, the empirical P values for each module built on the bias-matched permutation data could be adjusted by gene length (P GL) or the number of SNPs per gene (P nSNPs). A detailed description of this function can be found in previous work [23]. Another type of bias was that, in the PPI network, nodes with many interactors (high degree) are more likely to be recruited in module expansion steps. We thus categorized all the nodes in the working PPI network into four categories by their degree values (degree range 0–22, 22–24, 24–26, and >26) (Figure S1). For each module, a topologically matched random module was generated by randomly sampling the same number of nodes in each of the four node bins. An empirical P value is computed by , where is the score of the random module for the π th resample, and is the observed module score. Third, to assess the disease association of the modules, we performed permutation test by shuffling case/control labels in the GWAS datasets. We generated 1,000 permutation datasets using the genotyping data, and computed module scores in each permutation dataset in the same way as for the real case. An empirical P value for each module was computed according to , where Zm (permutation) is the module score in the permutation data. A combinatorial set of criteria was defined to select modules: (1) P(Zm ) 0.185, a cutoff value that is halfway between the expected IBD for third- and second-degree relatives. We performed inverse-variance weighted meta-analysis based on the fixed-effects model using the tool meta (http://www.stats.ox.ac.uk/~jsliu/meta.html). This method combines study-specific beta values under the fixed-effects model using the inverse of the corresponding standard errors as weights. Between-study heterogeneity was tested based on I2 and Q statistics. SNPs with evidence of heterogeneity were removed. The three GWAS datasets were genotyped on the same platform; thus, we performed meta-analysis directly on the genotyped SNPs without imputation. Genomic control within each study was conducted in the meta-analysis using the lambda value to adjust the study-specific standard error (SE). Functional enrichment tests We performed pathway enrichment analysis by the IPA system (http://www.ingenuity.com) and also using canonical pathways from the KEGG database [9] by the hypergeometric test. The KEGG pathway annotations were downloaded in March 2011, containing 201 pathways with size ≥10 and ≤250. For each gene set collection, the results by the hypergeometric test were adjusted by the Bonferroni method for multiple testing correction. To further assess the significance of the identified gene sets, we performed empirical assessment of the significance by resampling 1000 times from the network genes, with each resample containing a random set of 205 genes. For a gene set S, we recorded the number of resamples in which the gene set was significant and computed an empirical P value by . Supporting Information Figure S1 Degree distribution of GAIN GWAS-weighted (top) and ISC GWAS-weighted (bottom) networks. Each node in the network was assigned to a degree bin based on its -log2(degree) value. (PDF) Click here for additional data file. Figure S2 Module size distribution of GAIN GWAS-weighted (top) and ISC GWAS-weighted (bottom) networks. (PDF) Click here for additional data file. Figure S3 Protein-protein interaction network consisting of module genes for schizophrenia. (PDF) Click here for additional data file. Table S1 Functional enrichment results using KEGG pathways for module genes. (DOCX) Click here for additional data file.