1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs

          Introduction Complex traits are generally influenced by many genes with small individual effects [1]. This ‘polygenic’ architecture has been difficult to characterize. While Genome-wide association studies (GWAS) [2] have successfully identified thousands of trait-associated single nucleotide polymorphisms (SNPs) [3], even when considered in aggregate, these SNPs explain small portions of the trait heritability [4]. Recent results indicate that GWAS have the potential to explain much of the heritability of common complex phenotypes [5], [6], and more SNPs are likely to be identified in larger samples [7]. However, there are few methods available for identifying more of the SNPs likely to be associated with phenotypes without increasing the sample size, as recognized by recent European and US calls for new statistical genetics methods. The crucial issue is that for most complex traits a large number of SNPs have too small an effect to pass standard GWAS significance thresholds given current sample sizes. We present results suggesting new analytical approaches for GWAS will uncover more of the polygenic effects in complex disorders and traits. We hypothesize that all SNPs in a GWAS are not exchangeable, but come from pre-determinable categories with different distributions of effects. This implies that some categories of SNPs are enriched, i.e. are more likely to be associated with a phenotype than others. This information can be used to calculate the category-specific True Discovery Rate (TDR), or the expected proportion of correctly rejected null hypotheses [8]. SNPs from enriched SNP categories will have an increased TDR for a given effect size, or equivalently, for a given nominal p-value. Stratified False Discovery Rate (sFDR) methods [9] provide an established framework for demonstrating the utility of using enriched genic categories to increase power to discover SNPs likely to replicate in independent samples. Previous work has applied sFDR and related methods to GWAS data stratified by candidate regions determined through prior linkage analysis and/or candidate gene studies [10]–[12] and specific biological pathways related to disease etiology [13]. Others have considered stratification by genome annotations in linkage analysis [14] and Bayesian association analyses [15], demonstrating the utility of this approach for improving power and FDR based discovery where reliable, pre-determinable strata exist. It has been suggested that variation in and around genes harbors more polygenic effects [6], [16]. However, the particular gene elements (i.e., intron, exon, UTRs) containing these variants and the distribution of effect sizes in GWAS have been left to extrapolation and speculation. Further, SNPs in and around genes have been shown to explain more variation [6] and replicate at higher rates [16] than intergenic SNPs. These studies, however, did not parse genic regions down to specific genic elements. We here hypothesize that SNPs in regulatory and coding elements of protein coding genes will show an enrichment of polygenic effects relative to intronic and intergenic SNPs which will be reflected in an increased estimated TDR and empirically confirmed through improved replication rate across independent samples. The association signal of a SNP tested in GWAS is a surrogate for, or ‘tags,’ the potential effects of many other variants. Thus, any of a number of ‘tagged’ variants could underlie the observed association signal. Focusing on the tag SNPs only, without systematically capturing the underlying causal variants within a ‘tagged’ linkage block, limits the functional inferences that can be drawn from GWAS. By incorporating the correlation between SNPs (linkage disequilibrium; LD) we expect a stronger and more consistent differentiation of enrichment among genic annotation categories. In the current study, we use an LD-weighted scoring algorithm that allows quantification of the properties of multi-locus LD structure implicitly captured by each tag SNP to our enrichment analysis. These categories can be leveraged to create strata for established sFDR approaches. We employ a model free strategy to identify enriched strata among phenotypes based on GWAS summary statistics. We first calculate the relative enrichment in different genic elements, using the category-specific empirical cumulative distribution function (cdf) of the nominal p-values after controlling for estimated genomic inflation. For each nominal p-value threshold an estimate of the category-specific TDR = 1−FDR is obtained from these empirical cdfs. This analysis is implemented on summary p-values from ten published GWAS meta-analyses studying 14 phenotypes. We then use the sub-study GWAS in Crohn's disease to test if the estimated increased TDR translates to improved replication rates, showing that for a given replication rate the nominal p-value threshold is 100 times larger for the most enriched genic category compared to the intergenic category. Finally, using an established sFDR framework we demonstrate the utility of leveraging enriched categories for improving power to detect SNPs likely to replicate, i.e., to reject more null hypotheses for a fixed FDR. Results LD-Based Enrichment of Genic Elements in Height Under multiple testing paradigms such as GWAS, quantitative estimates of likely true associations can be estimated from the distributions of summary statistics [17], [18]. A common method for visualizing the enrichment of statistical association relative to that expected under the global null hypothesis is through Q-Q plots of the nominal p-values resulting from GWAS. Under the global null hypothesis the theoretical distribution is uniform on the interval [0,1]. Thus, the usual Q-Q curve has as the y-coordinate the nominal p-value, denoted by “p”, and the x-coordinate the value of the empirical cdf at p, which we denote by “q”. As is common in GWAS, we instead plot −log10 p against the −log10 q to emphasize tail probabilities of the theoretical and empirical distributions. In such plots, enrichment results in a leftward shift in the Q-Q curve, corresponding to a larger fraction of SNPs with nominal −log10 p-value greater than or equal to a given threshold (see Material and Methods). The stratified Q-Q plot for height (Figure 1) shows a clear variation in enrichment across genic annotation categories. The separation between the curves for different categories is enhanced when using LD-weighted genic annotation categories in comparison to non LD-weighted positional categories (Figure S3). The parallel shape of these curves is likely caused by the significant but imperfect correlation among categories due to the non-exclusive nature of the annotation scoring (Figure S2). 10.1371/journal.pgen.1003449.g001 Figure 1 Stratified Q-Q plot for height shows enrichment by annotation categories using Linkage-Disequilibrium (LD)-weighted scores. Genic annotation categories were: 1) 10,000 to 1,001 base pairs upstream (10 k Up); 2) 1,000 to 1 base pair upstream (1 k Up); 3) 5′ untranslated region (5′UTR); 4) Exon; 5) Intron; 6) 3′ untranslated region (3′UTR); 7) 1 to 1,000 base pairs downstream (1 k Down); 8) 1,001 to 10,000 base pairs downstream (10 k Down). Q-Q plot of height with non-LD weighted category scores are shown in Figure S3. An earlier departure from the null line (leftward shift) suggests a greater proportion of true associations, for a given nominal p-value. The divergence of the curves for different categories implies that the proportion of non-null effects varies considerably among annotation categories of genic elements. For example, the proportion of SNPs in the 5′UTR category reaching a significance level of −log10(p)>10 is roughly 10 times greater than for all SNPs and 50–100 times greater than for intergenic SNPs. Polygenic Enrichment across Diverse Phenotypes Recently Yang et al [19] demonstrated that an abundance of low p-values beyond what is expected under null hypotheses in GWAS, but not necessarily reaching stringent multiple comparison thresholds, often attributed to ‘spurious inflation,’ is also consistent with an enrichment of true ‘polygenic’ effects [19]. The prevalence of enrichment below the established genome-wide significance threshold of p 7.3;) in height (Figure 2A) is consistent with their hypotheses and strongly suggests that current GWAS do not capture all of the additive ‘tagged variance’ in this phenotype. Importantly, this enrichment varies across genic annotation categories. 10.1371/journal.pgen.1003449.g002 Figure 2 Stratified Q-Q plots and true discovery rates show consistency of enrichment. Upper panel: Stratified Q-Q plots illustrating consistent enrichment of genic annotation categories across diverse phenotypes: (A) Height, (B) Schizophrenia (SCZ), and (C) Cigarettes per Day (CPD). All figures are corrected for inflation using intergenic inflation control. Only nominal p-values below the standard genome-wide significance threshold (p .05 within 1,000,000 basepairs, we estimate the ratio of correlated pairs (r2>.05) to total pairs of p-values at 0.000128. Replication Rate For each of eight sub-studies contributing to the final meta-analysis in the CD report we independently adjusted z-scores using intergenic inflation control. For each of 70 (8 choose 4) possible combinations of four-study discovery and four-study replication sets, we calculated the four-study combined discovery z-score and four-study combined replication z-score for each SNP as the average z-score across the four studies, multiplied by two (the square root of the number of studies). For discovery samples the z-scores were converted to two-tailed p-values, while replication samples were converted to one-tailed p-values preserving the direction of effect in the discovery sample. For each of the 70 discovery-replication pairs cumulative rates of replication were calculated over 1000 equally-spaced bins spanning the range of negative log10(p-values) observed in the discovery samples. The cumulative replication rate for any bin was calculated as the proportion of SNPs with a −log10(discovery p-value) greater than the lower bound of the bin with a replication p-value 0.2. For each SNP the sum of LD with each genic annotation category was recorded. SNPs were assigned to categories by thresholding continuous scores with an inclusive lower bound of 1.0. Positional (non LD-weighted) scores were recorded as the annotation for the GWAS tag SNP's location only. (TIF) Click here for additional data file. Figure S2 Correlations among annotation categories and scores. (A) Heat map displaying the Spearman's correlation coefficients among continuous valued LD-weighted annotation scores. (B) Heat map displaying the Spearman's correlation coefficients among thresholded and binarized annotation categories presented in Q-Q plots. Correlations are reported using the annotations for the union of SNPs across all GWAS (2,558,411 SNPs). (TIF) Click here for additional data file. Figure S3 Enrichment in Height without LD weighted annotation. Q-Q plot showing enrichment of genic annotation categories using positional scores (non LD-weighted). Enrichment patterns are present, but less apparent than using LD-weighted annotation scores (Figure 1). No inflation correction was performed by our group. (TIF) Click here for additional data file. Figure S4 Height before and after Intergenic Inflation Control. (A) Q-Q plot of height without correction for genomic inflation. (B) Q-Q plot of height after correction for genomic inflation using the ‘intergenic inflation control’. Note the overcorrection (grey line below null-hypothesis line, marked by red arrows) in the un-corrected Q-Q plot in Panel A is resolved in Panel B. Although slight, because of the log scaling of these plots, this slight deflation (left of 1.5 on the x-axis of Panel A) occurs over a much greater proportion of the distribution and thus has a stronger effect on the mean and median of the distribution than the more visually apparent inflation in the extreme tails (right of 2 on the x-axis of Panel A). For lambda values, see Table S4. Only nominal p-values below the standard genome-wide significance threshold (p 0.05 and within 2 megabases (original scoring: r2>0.2 and within 1 megabase). BD, Bipolar Disorder; BMI, Body Mass Index; CD, Crohn's disease; CPD, Cigarettes per Day; DBP, Diastolic blood pressure; HDL, High density lipoprotein; LDL, Low density lipoprotein; SBP, systolic blood pressure; SCZ, Schizophrenia; TC, total Cholesterol; TG, triglycerides; UC, Ulcerative Colitis; WHR, Waist-hip-ratio. (TIF) Click here for additional data file. Figure S15 Replication rate among categories with alternate scoring parameters. A regenerated cumulative replication plot (Figure 4B) showing the average rate of replication (p 0.05 and all SNPs within 2 megabases) results in a similar pattern of increased replication as with the original parameters (including r2>0.2 and all SNPs within 1 megabases), with the exception of the intergenic category, which shows a noticeable decrease in the replication rate. (TIF) Click here for additional data file. Figure S16 Relationship between total categorical total LD and z-score2. The mean (z2) of each category, using the height GWAS, as we change the threshold for inclusion for both the original (A; including r2>0.2 and within 1 megabases), and alternate (B; r2>0.05 and within 2 megabases) parameters for LD weighted scoring. The mean(z2) increases approximately monotonically each category, but with noticeably different slopes. The 5′UTR category in figure A becomes unstable at high thresholds because there are very few SNPs remaining. Changing to a more inclusive LD weighted scoring increases the number of SNPs with high scores and improves the relationship. This suggests that even greater enrichment could be achieved by tuning the categorical inclusion threshold upwards. (TIF) Click here for additional data file. Figure S17 Parametric mixture model fits to Q-Q plots. Q-Q Plot for Height (A) and Crohn's Disease (B). Solid black lines are actual data. Dotted black lines are Q-Q curves under the global null hypothesis. Solid red lines are fitted Q-Q curves from Weibull mixture model for transformed p-values. Note, upper limit in Q-Q plot y-axes is 7.3, corresponding to GWAS-significance threshold of p = 5×10−8. (TIF) Click here for additional data file. Figure S18 Effect of non-null proportion on Q-Q plots. Predicted Q-Q Plot for Crohn's Disease (CD; solid black line) from parametric Weibull mixture model fit (model given by Equation [S9]). The blue line is the predicted Q-Q curve of the CD data if the non-null proportion π1 were 0.001 instead of the value 0.026 estimated from the CD data. The red line is the predicted Q-Q curve if the non-null proportion π1 were 0.10. (TIF) Click here for additional data file. Figure S19 Effect of sample size on Q-Q plots. Predicted Q-Q Plot for Crohn's Disease (CD; solid black line) from parametric Weibull mixture model fit (model given by Equation [S9]). The blue line is the predicted Q-Q curve of the CD data if the sample size were half as large as the true sample size (n = 51,109). The red line is the predicted Q-Q curve of the CD data if the sample size were five times as large as the true sample size. (TIF) Click here for additional data file. Table S1 Descriptive statistics for each GWAS study. All traits are highly heritable and summary statistics are from well-powered studies. All Studies were imputed with using the HapMap phase II as a reference, with the exception of CD, UC and SCZ that used HapMap phase III as a reference. These statistics describe the results of the study in the form they were obtained by our group. (XLSX) Click here for additional data file. Table S2 LD-weighted score distribution for the union of SNPs across all studies. The average score for different categories varies widely and reflects the relative abundance of the different elements within the genome. *Note intergenic scores are binary, with a score of 1 denoting an intergenic SNP. (XLSX) Click here for additional data file. Table S3 The number of SNPs per annotation category. The table shows the number of tag SNPs in each annotation category from each GWAS without LD based annotation (using only positional information (No LD) and after LD based annotation (LD). Note the increased number of SNPs in all annotation categories, especially in annotation categories such as 3′UTR and 5′UTR when using LD-weighted categories. BD, Bipolar Disorder; BMI, Body Mass Index; CD, Crohn's disease; CPD, Cigarettes per Day; DBP, Diastolic blood pressure; HDL, High density lipoprotein; LDL, Low density lipoprotein; SBP, systolic blood pressure; SCZ, Schizophrenia; TC, total Cholesterol; TG, triglycerides; UC, Ulcerative Colitis; WHR, Waist-hip-ratio. (XLSX) Click here for additional data file. Table S4 Estimated genomic inflation factors before and after intergenic inflation control (IIC). We present the estimates from either all SNPs or intergenic SNPs. The λGC values calculated before IIC were calculated from the summary statistics as they were made available to us, either by collaborators or public data repositories. Many of these studies already had performed a standard genomic control procedure, adjusting the test statistics down, to correct for inflation. For these studies our procedure may correct statistics upwards, increasing the computed λGC values. We leveraged the intergenic SNPs to estimate inflation because their relative depletion of associations suggests they provide a robust estimate of true null SNPs that is less contaminated by polygenic effects. Using annotation categories in this fashion is important given concerns posed by recent GWAS [8] about the over-correction of test statistics using standard genomic control [15]. Values greater than 1 indicate inflation and values less than 1 indicate an over correction, relative to the theoretical empirical null distribution. λGC was calculated as the ratio of the median z-score2 to the expected median of a Chi-square distribution with 1 degree of freedom, for all SNPs and intergenic SNPs independently. IIC, Intergenic Inflation Control; BD, Bipolar Disorder; BMI, Body Mass Index; CD, Crohn's disease; CPD, Cigarettes per Day; DBP, Diastolic blood pressure; HDL, High density lipoprotein; LDL, Low density lipoprotein; SBP, systolic blood pressure; SCZ, Schizophrenia; TC, total Cholesterol; TG, triglycerides; UC, Ulcerative Colitis; WHR, Waist-hip-ratio. (XLSX) Click here for additional data file. Table S5 Significance of QQ-plot enrichment. The p-values of the enrichment of the Q-Q plots for all phenotypes compare intergenic annotation category with each other annotation category. Each p-value corresponds to the median Kolmogorov-Smirnov (KS) statistic from 10 iterations of each comparison for 10 different random prunings of SNPs to approximate independence (r2 .2) relating each GWAS tag SNP to all 1KGP SNPs within 1,000,000 base pairs. Note the consistent pattern across phenotypes, with large variation between annotaion categories, with highest LD score in 5′UTR. BD, Bipolar Disorder; BMI, Body Mass Index; CD, Crohn's disease; CPD, Cigarettes per Day; DBP, Diastolic blood pressure; HDL, High density lipoprotein; LDL, Low density lipoprotein; SBP, systolic blood pressure; SCZ, Schizophrenia; TC, total Cholesterol; TG, triglycerides; UC, Ulcerative Colitis; WHR, Waist-hip-ratio. (XLSX) Click here for additional data file. Table S8 Per category average per SNP number of tagged SNPs. The average total number of SNP tagged (r2>0.2) by a tag SNP per genic annotation category for each phenotype is shown. Note the consistent pattern across phenotypes, with variation between categories, and highest number in 5′UTR. The distribution of block sizes does match the ordering of enrichment by category. BD, Bipolar Disorder; BMI, Body Mass Index; CD, Crohn's disease; CPD, Cigarettes per Day; DBP, Diastolic blood pressure; HDL, High density lipoprotein; LDL, Low density lipoprotein; SBP, systolic blood pressure; SCZ, Schizophrenia; TC, total Cholesterol; TG, triglycerides; UC, Ulcerative Colitis; WHR, Waist-hip-ratio. (XLSX) Click here for additional data file. Table S9 Per category average MAF. The average minor allele frequency of GWAS tag SNPs in each genic annotation category for every phenotype is not consistent with this effect driving our enrichment patterns. Note the similarities across phenotypes and annotation categories. BD, Bipolar Disorder; BMI, Body Mass Index; CD, Crohn's disease; CPD, Cigarettes per Day; DBP, Diastolic blood pressure; HDL, High density lipoprotein; LDL, Low density lipoprotein; SBP, systolic blood pressure; SCZ, Schizophrenia; TC, total Cholesterol; TG, triglycerides; UC, Ulcerative Colitis; WHR, Waist-hip-ratio. (XLSX) Click here for additional data file. Table S10 Multiple regression analysis predicting log(Z2) in height. A multiple regression analysis reveals a minimal, but significant, effect of total LD on the log z2 for height. This represents a minimal, but significant, effect of overall LD block size on enrichment. Categorical effects remain independently strong in this analysis with an effect size order that mirrors enrichment. (DOCX) Click here for additional data file. Table S11 Null GWAS Simulations. We present simulations of categorical enrichment based on multiple independent null GWAS simulations using subjects with European ancestry from the 1000 Genomes Project. Random phenotypes were generated unrelated to genotypes for each subject, association z-scores were computed for each tag SNP, and mean(z2) was computed for each annotation category, using the same procedure as applied to the actual GWAS data. The means and standard deviations were computed from 20 independent simulation runs. The results demonstrate that the observed differential enrichment of annotation categories cannot be explained by category-specific spurious sources of genomic inflation due to differential LD or MAF. (DOCX) Click here for additional data file. Table S12 FDR versus sFDR Discovery. Leveraging the enriched genic annotation categories to create strata among the SNPs we show that the stratified false discovery rate (sFDR) method improves the discovery of SNPs for a given FDR threshold, across all phenotypes. The numbers reported are after pruning SNPs for LD at a threshold of r2≤0.2. (XLSX) Click here for additional data file. Text S1 Supplementary text extending Materials and Methods and presenting supporting analyses. More details are provided with respect to the acquisition, processing and annotating of the GWAS data used for the main results. The relationship between QQ-plots and False Discovery Rate is extended and related to our measures of enrichment. Also, the main results are described within the context of a mixture-modeling framework. Finally, a series of control experiments are described and supplementary references are enumerated. (DOCX) Click here for additional data file.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation

            Introduction Hundreds of genome-wide association studies (GWAS) have been conducted to study the genetic bases of complex human traits. As of January, 2014, more than 12,000 single-nucleotide polymorphisms (SNPs) have been reported to be significantly associated with at least one complex trait (see the web resource of GWAS catalog [1] http://www.genome.gov/gwastudies/). Despite of these successes, these significantly associated SNPs can only explain a small portion of genetic contributions to complex traits/diseases [2]. For example, human height is a highly heritable trait whose heritability is estimated to be around 80%, i.e., 80% of variation in height within the same population can be attributed to genetic effects [3]. Based on large-scale GWAS, about 180 SNPs have been reported to be significantly associated with human height [4]. However, these loci together only explain about 5-10% of variation in height [2], [4], [5]. This phenomenon is referred to as the “missing heritability” [2], [6], [7]. Identifying the source of this missing heritability has drawn much attention from researchers, and progress has been made towards explaining the apparent discrepancy. The role of a much greater-than-expected set of common variants (minor allele frequency (MAF) 0.01) has been shown to be critical in explaining the phenotypic variance [8]. Instead of only using genome-wide significant SNPs, Yang et al. [9] reported that, by using all genotyped common SNPs, 45% of the variance for human height can be explained. This result suggests that a large proportion of the heritability is not actually missing: given the limited sample size, many individual effects of genetic markers are too weak to pass the genome-wide significance, and thus those variants remain undiscovered. So far, people have found similar genetic architectures for many other complex traits [10], such as metabolic syndrome traits [11] and psychiatric disorders [12]–[14]. That is, the phenotype is affected by many genetic variants with small or modest effects effects that cannot be confirmed individually via statistical significance, which is usually referred to as “polygenicity”. The polygenicity of complex traits is further supported by recent GWAS with larger sample sizes, in which more associated common SNPs with moderate effects have been identified (e.g., [15]). Clearly, the emerging polygenic genetic architecture imposes a great challenge of identifying risk genetic variants: a larger sample size is required to identify genetic variants with smaller effect sizes. However, sample recruitment may be expensive and time-consuming. It would be desirable to find a way to increase power to detect variants that miss significance on standard GWAS without extensive additional subject recruitment requirements. Integrative analysis of genomic data could be a promising direction, including combining GWAS data of multiple genetically related phenotypes and incorporating relevant biological information. The last few years have seen concrete demonstrations of “pleiotropy”, i.e. the sharing of genetic factors, between human complex traits. For example, a systematic analysis of the NHGRI catalog of published GWAS (http://www.genome.gov/gwastudies/) showed that 16.9% of the reported genes and 4.6% of the reported SNPs are associated with multiple traits [16]. Through a “pleiotropic enrichment” method, Andreassen et al showed that it is possible to improve the power to detect schizophrenia-associated genetic variants by using the pleiotropy between schizophrenia (SCZ) and cardiovascular-disease [17]. A more recent study identified four significant loci (p-value ) to have pleiotropic effects by analyzing GWAS data of 33,332 cases and 27,888 controls for five psychiatric disorders [18]. Further analysis suggested a very significant genetic correlation between schizophrenia and bipolar disorder ( s.e.) [12]. Pleiotropy has also been demonstrated among several other types of traits, for example, metabolic syndrome traits [11] and cancers [19]. An increasing number of studies also suggest that functionally annotated SNPs are generally more biologically important than those that are not annotated, and henceforth more likely to be associated with complex traits. To name a few, using GWAS data of different traits (e.g., Crohn's disease and SCZ), Schork et al. [20] demonstrated a consistent pattern of enrichment of GWAS signals among functionally annotated SNPs. Yang et al. [21] showed that SNPs in genic regions could explain more variance of height and body mass index (BMI) than SNPs in intergenic regions. Nicolae et al. [22] found that SNPs associated with complex traits were more likely to be expression quantitative trait loci (eQTL). In addition, public availability of a vast amount of functional annotation data also provides unprecedented opportunities to investigate the enrichment of GWAS signals among these various types of functional annotations. For example, the Encyclopedia of DNA Elements (ENCODE) Consortium has recently generated extensive experimental data on gene expression (RNA-seq), DNA methylation status (RRBS-seq), chromatin modifications (ChIP-seq), chromatin accessibility (DNase-seq and FAIRE-seq), transcription factor (TF) binding sites (ChIP-seq), and long-range chromatin interactions (ChIA-PET, Hi-C, and 5C). As of September 2012, more than 1,600 data sets from 147 cell lines had been produced to annotate the human genome, including 2.89 million unique, non-overlapping DNase I hypersensitivity sites (DHSs) in 125 cell lines using DNase-seq and 630K binding regions of 119 DNA-binding proteins in 72 cell lines using ChIP-seq, among many [23]. The ENCODE Consortium [23] examined 4,492 risk-associated SNPs from the NHGRI GWAS catalog and found that 12% of them overlap with TF binding regions and 34% overlap with DHSs. The increasing evidence of pleiotropy between complex traits and the increasing functional annotation data call for novel statistical methods to effectively analyze multiple GWAS data sets and functional annotation data simultaneously. Statistical methods to investigate pleiotropy have been actively researched (reviewed in [24] and [25]), for example, using linear mixed models [26], [27] or the conditional False Discovery Rate (FDR) approach [17], [28]. However, these methods do not allow use of functional annotation data for prioritization of GWAS results. On the other hand, various statistical methods have been proposed to make use of functionally annotated SNPs in recent years (reviewed in [29], [30], and [31]). For example, GSEA [32] identifies potentially important pathways in which target genes of risk-associated SNPs are involved while RegulomeDB [33] allows nucleotide-level annotations of risk-associated SNPs, especially for those located in non-coding regions. Stratified FDR methods have been applied to incorporate annotation into GWAS data analysis [20]. However, each of these methods was designed for the analysis of a single phenotype and hence, they do not use functional annotation data fully efficiently for the genetic variants shared by multiple phenotypes. There is a need to develop a coherent statistical framework for the integration of functional annotation data for joint analysis of genetically correlated GWAS information. In this article, we propose a unified statistical framework, named GPA (Genetic analysis incorporating Pleiotropy and Annotation), to prioritize GWAS results based on pleiotropy and annotation information. GPA also provides statistically rigorous and biologically interpretable inference tools for this purpose. The method can easily be used for other purposes as well, as we will discuss below. This article is organized as follows. First, we investigate the properties of GPA using extensive simulation studies and illustrate the versatility and utility of GPA with the analysis of real data. Specifically, we apply GPA to the five psychiatric disorder GWAS data from the Psychiatric Genomics Consortium with central nervous system gene expression data and show that GPA can accurately identify pleiotropy structure among these diseases. We further apply GPA to the bladder cancer GWAS data with the ENCODE DNase-seq data from 125 cell lines and show that GPA can detect cell lines that are biologically more relevant to bladder cancer. Lastly, we discuss many issues related to GPA. The details of our GPA model and its statistical inference procedures are provided in the Materials and Methods section. Results Simulation study We conducted comprehensive simulation studies to evaluate GPA performance. The p-values for non-risk SNPs can be simulated easily from a uniform distribution . For risk-SNPs, we can simulate their p-values via different approaches. The most favorable simulation for our GPA model is to simulate them from the Beta distribution. To examine the robustness of our GPA model, we adopted an alternative simulation scheme under the framework of the linear mixed model and liability threshold model that has gained increasing interest recently (e.g., [9], [13]). The detailed procedures will be described later. But we emphasize that there is substantial discrepancy between the generative model used in simulation and the GPA model. The primary purpose of our simulation study is to investigate whether the GPA model can robustly improve the power to detect risk SNPs by integrating multiple GWAS data sets and annotation data despite this discrepancy. To simulate case-control GWAS data for two genetically correlated diseases, we followed the classical liability threshold model [13]. For each disease, we first simulated a large cohort of individuals with genotypes of independent SNPs. The MAFs of these SNPs were drawn uniformly from [0.05, 0.5]. Then we randomly designated SNPs as risk SNPs. The per-minor-allele effect of each risk SNP was drawn from a normal distribution with zero-mean and variance of , where is the desired level of variance explained by all SNPs on the liability scale and is the MAF of the corresponding risk SNP. We also simulated the environmental effect on the liability scale for each individual from a standard normal distribution (zero mean and unit variance). The total liability for each individual was then obtained by adding up all the genetic effects and the environmental effect. Given a desired disease prevalence , individuals with liabilities greater than the quantile were classified as cases and others were classified as controls. Then equal numbers of cases and controls were drawn from the cohort as a GWAS data set. When simulating two diseases simultaneously, we simulated two disjoint cohorts with the same set of SNPs. To reflect the pleiotropy effects between the two diseases, risk SNPs ( ) were chosen to be shared by the two diseases. The annotation status of each risk and non-risk SNP was simulated from a Bernoulli distribution with probability of and , respectively. In our simulation study, the total number of SNPs, , was set to be 20,000, and the sample size of each data set, , was set at 2000, 5000 or 10000, respectively. The number of risk SNPs was the same for the two diseases and was set at 500, 1000 or 2000, respectively. We varied the proportion of shared risk SNPs between the two diseases, , from to 1. Note that corresponds to the absence of pleiotropy. The disease prevalence, , was fixed at 0.1 and the variance explained by all the SNPs, , was fixed at 0.6 for each of the two diseases. Here and were fixed at 0.4 and 0.1, respectively. We first evaluated SNP prioritization performance of GPA. Specifically, after the two data sets were simulated, we obtained the p-value for each SNP in each disease using a test with one degree of freedom. Then we analyzed the simulated data using our GPA method in the following four modes: 1. analyzing the two diseases separately without the annotation data; 2. analyzing the two diseases separately with the annotation data; 3. analyzing the two diseases jointly without the annotation data; and 4. analyzing the two diseases jointly with the annotation data. In each mode, we compared the order of the local FDR obtained using GPA against the actual risk status of the SNPs to calculate the area under the receiver operating characteristic curve (AUC) as a measure of risk SNP prioritization accuracy. The left panel of Figure 1 shows the AUCs from the four modes with and (results for other scenarios are shown in Figures S10-S20 in Text S1). Because all the simulation parameters were the same for the two diseases, only the results for the first disease are shown. We can see that incorporating either annotation information or pleiotropy between the two diseases improved the prioritization performance. In particular, as the proportion of shared risk SNPs increased, the prioritization performance also improved. Given the local FDR obtained using GPA, we controlled the global FDR at 0.2 and calculated the average power to identify the true risk SNPs. GPA performance measured by partial AUC and power for and are shown in the middle and right panels of Figure 1, respectively (results for other scenarios are shown in Figures S10–S20 in Text S1). We also evaluated the actual FDR and found that the FDR was indeed controlled at 0.2 with occasional slight conservativeness (Figure 2 and Figures S21–S31 in Text S1). In addition, we evaluated the actual FDR in the presence of linkage disequilibrium (LD) among SNPs. The details of the simulations and results are given in Figure S1 in Text S1. These results suggest that GPA can improve the power of identifying risk variants while controling FDR at the nominal level despite the mismatch between GPA and the generative model in our simulation study. 10.1371/journal.pgen.1004787.g001 Figure 1 AUC (left), partial AUC (Middle) and power (right) of GPA for SNP prioritization with sample size  = 5000 and number of risk SNPs  = 1000. The results are based on 200 simulations. 10.1371/journal.pgen.1004787.g002 Figure 2 Global false discovery rates of GPA at sample size  = 5000 and number of risk SNPs  = 1000. Upper panel: Global false discovery rates of GPA with annotation. Lower panel: Global false discovery rates of GPA without annotation. From left to right: FDR of first GWAS (joint analysis), FDR of second GWAS (joint analysis), FDR of first GWAS (separate analysis), FDR of second GWAS (separate analysis) and FDR of risk variants shared by both GWAS. For all scenarios, the global false discovery rates of GPA are controlled at the nominal level. Regarding parameter estimation, we found that GPA provided a satisfactory estimate of , the probability of being annotated for a certain group of SNPs, as long as there are enough SNPs in that group (Figures S32–S44 in Text S1). But we note that the estimates of the proportion of risk and non-risk SNPs, , may be biased (Figures S45–S66 in Text S1). This is no surprise because the distribution of the p-values of the risk SNPs obtained from the generative model in the simulation study may differ from the Beta distribution assumed in GPA. If the p-values of the risk SNPs are indeed generated from the Beta distribution, our GPA model can give fairly accurate estimates of (Figures S67-S72 in Text S1). As a comparison, we also used the “conditional FDR” approach proposed by Andreassen et al. [17] to prioritize SNPs in our simulations. The comparison results between GPA and the conditional FDR at and are shown in Figure 3 (other results are provided in Figures S77–S87 in Text S1). GPA significantly outperformed the conditional FDR approach in SNP prioritization. More importantly, in the absence of pleiotropy, the conditional FDR approach had worse accuracy than single-GWAS analysis using the standard FDR approach, whereas GPA achieved comparable accuracy with single-GWAS analysis in this scenario. This suggests that GPA was able to take advantage of pleiotropy when it exists while, in clear contrast to the conditional FDR approach, it does not sacrifice much statistical power than the conditional FDR approach when it is absent. 10.1371/journal.pgen.1004787.g003 Figure 3 Comparisons of receiver operating characteristic curves measured by AUCs (Left) and partial AUCs (Right) between GPA and the conditional FDR approach at sample size  = 5000 and number of risk SNPs  = 1000. The results are based on 200 simulations. Next, we evaluated the type I error and power of GPA for hypothesis testing on the significance of annotation enrichment for risk SNPs. Gene Set Enrichment Analysis (GSEA) [34] is a popular method to accomplish a similar task. Although GSEA typically is used for gene expression data analysis, its input can be a list of p-values obtained from any source. Therefore we implemented the GSEA method to test the enrichment of the -values of a set of SNPs being annotated and compared it with GPA. We followed the previous simulation scheme and simulated one GWAS data set with , varying from 2000 to 10000, and varying from 500 to 2000. Here was fixed at 0.1 and was varied from 0.1 to 0.5. We set the statistical significance level at 0.05. Type I error rate was evaluated at and power was evaluated at . The results for are shown in Figure 4. In general, GPA provided much higher power than GSEA while both methods appropriately controlled the type I error rate. 10.1371/journal.pgen.1004787.g004 Figure 4 The comparison between GPA and GSEA at number of risk SNPs  = 1000. Here we fixed and varied to evaluate the power for sample size  = 2000 (Upper Left panel), 5000 (Upper Right panel), 10000 (Lower Left panel), respectively. We used to evaluate the type I errors (Lower Right panel). The results are based on 500 simulations. We further evaluated the type I error rate and power of GPA for the test of pleiotropy in our simulations. The simulation parameters were the same as those in the previous simulations. Power was evaluated at , , , and . The type I error rate was evaluated at , corresponding to the expected shared proportion of risk SNPs in the absence of pleiotropy. As shown in Figure 5, power increased as decreased and as and increased, whereas the type I error rate was appropriately controlled in all cases. Please note that type I errors and power remained almost the same for hypothesis testing of pleiotropy in the presence of annotation (see Figures S2–S4 in Text S1). 10.1371/journal.pgen.1004787.g005 Figure 5 The type I error rate and power of the pleiotropy test. Here we varied to evaluate the power for sample size  = 500 (Upper Left panel), 1000 (Upper Right panel), and 2000 (Lower Left panel), respectively. We used to evaluate the type I errors of the pleiotropy test (Lower Right panel). In each setting, we also varied sample size  = 1000, 2000, and 10000. Note that type I error rate and power of the pleiotropy test remain almost the same in presence of annotation (see Figure S9 in Text S1). Lastly, we performed additional simulations with moderate heritability ( ) and pleiotropy ( ). The results shown in Figures S73–S76 in Text S1 demonstrate that, with moderate heritability and pleiotropy, GPA can still effectively improve the power by leveraging pleiotropy between related traits. Therefore, GPA can serve as a more powerful tool for integrative analysis in the post-GWAS era. Real data analysis GWAS of five psychiatric disorders We applied our GPA model to the analysis of five psychiatric disorders, as has been done in previous publications using different methodologies [12], [18]. Traits included were attention deficit-hyperactivity disorder (ADHD), autism spectrum disorder (ASD), bipolar disorder (BPD), major depressive disorder (MDD), and schizophrenia (SCZ). Detailed information about these data sets is provided in [12], [18]. Summary statistics of the five psychiatric disorders were downloaded from the section for cross-disorder analysis at the Psychiatric Genomics Consortium (PGC) website. The p-values were available for 1,230,535 SNPs in ADHD, 1,245,864 SNPs in ASD, 1,233,533 SNPs in BPD, 1,232,794 SNPs in MDD, and 1,237,959 SNPs in SCZ, respectively. We took the intersection of those SNPs, resulting in a p-value matrix for the five psychiatric disorders. First, we performed single-GWAS data analysis using genes preferentially expressed in the central nervous system (CNS) [13], [34] as the annotation data. Specifically, we generated the annotation vector as follows: The entries in corresponding to SNPs within 50-kb of the genes from the CNS set were set to be 1. Among all the SNPs, 21.9% were thus annotated to be CNS related. The analysis results of these five psychiatric disorders are given in Table 1. The estimated fold enrichment of the CNS set was 1.749 (s.e. 0.447), 1.261 (s.e. 0.055), 1.467 (s.e. 0.033), 1.177 (s.e. 0.058) and 1.391 (s.e. 0.022) for ADHD, ASD, BPD, MDD and SCZ, respectively. PGC investigators also evaluated enrichment of the CNS gene set by variance component estimation using linear mixed models (LMM) [12], suggesting about 1.6 and 1.5 fold enrichments in BPD and SCZ, respectively. Two explanations may contribute to these minor differences in fold enrichment estimation between GPA and LMM: First, GPA only used summary statistics while LMM used both phenotype and genotype data. Second, GPA and LMM employed different mathematical definitions of fold enrichment: GPA used the ratio between and , while LMM used the ratio between the proportion of the variance explained by SNPs in the CNS set and the proportion of the CNS set in entire genome. Furthermore, we evaluated the significance of enrichment of the CNS set by hypothesis testing. As shown in Table 1, enrichment of the CNS gene set was strong in BPD and SCZ, moderate in ASD and MDD, and nonsignificant in ADHD. 10.1371/journal.pgen.1004787.t001 Table 1 Single-GWAS analysis of five psychiatric disorders using the CNS gene set as the annotation data. p-value ADHD 0.694 (0.103) 0.991 (0.006) 0.009 (0.006) 0.218 (0.001) 0.381 (0.055) 1.749 (0.447) 0.083 ASD 0.710 (0.014) 0.909 (0.007) 0.091 (0.007) 0.214 (0.001) 0.270 (0.009) 1.261 (0.055) 8.408e-07 BPD 0.697 (0.007) 0.821 (0.007) 0.179 (0.007) 0.202 (0.001) 0.297 (0.004) 1.467 (0.033) 1.439e-48 MDD 0.837 (0.019) 0.807 (0.027) 0.193 (0.027) 0.212 (0.003) 0.249 (0.008) 1.177 (0.058) 0.005 SCZ 0.596 (0.004) 0.804 (0.004) 0.196 (0.004) 0.203 (0.001) 0.283 (0.003) 1.391 (0.022) 7.742e-79 Here is the estimate of parameter of Beta distribution (4), and are the estimated proportion of null-SNPs and non-null-SNPs defined in (4), and are the estimated proportion of SNPs annotated by the CNS gene set among null and non-null SNPs, respectively. Enrichment fold and p-value given by hypothesis testing of enrichment in annotation data are provided in last two columns. The values in the brackets are standard errors of the estimates. Next, we applied GPA to study pairwise pleiotropy of these five psychiatric disorders without using annotation data. As shown in Table 2, our results suggest that the pleiotropy effect was strong between BPD and SCZ (p-value is essentially zero), MDD and SCZ (p-value ), BDP and MDD (p-value ), ASD and SCZ (p-value ), and ASD-BPD (p-value ); moderate between ADHD and BPD (p-value ), ADHD and SCZ ( ); and non-significant for all other pairs. Our results largely agree with those reported in [12] and the disagreements mainly came from those between ADHD and other disorders. The pleiotropy between ADHD and MDD was reported to be moderate in [12], while GPA did not detect this moderate effect. From single GWAS analysis of ADHD, given in Table 1, the estimated parameters ( (s.e. 0.006), ) indicate that its GWAS signals were very weak. For MDD, the estimated parameter also indicates the weak marginal signals for MDD. Consequently, the marginal GWAS signals of ADHD and MDD were too weak to allow GPA to detect the pleiotropic effect between them. Since the data analysis performed in [12] used genotype data, the bivariate LMM could still have enough power to detect a moderate genetical correlation between ADHD and MDD. 10.1371/journal.pgen.1004787.t002 Table 2 Pleiotropy estimated among five psychiatric disorders. LRT p-value ADHD-ASD 0.900 (0.009) 0.007 (0.006) 0.093 (0.009) 0.001 (0.004) 0.913 0.339 ADHD-BPD 0.822 (0.008) 0.001 (0.005) 0.164 (0.009) 0.013 (0.007) 29.849 4.670e-08 ADHD-MDD 0.776 (0.036) 0.006 (0.010) 0.217 (0.036) 0.001 (0.010) −0.005 1 ADHD-SCZ 0.804 (0.005) 0.001 (0.004) 0.183 (0.008) 0.012 (0.007) 15.855 6.837e-05 ASD-BPD 0.791 (0.008) 0.027 (0.007) 0.115 (0.009) 0.067 (0.008) 69.391 8.074e-17 ASD-MDD 0.727 (0.033) 0.049 (0.016) 0.180 (0.033) 0.044 (0.016) 2.717 0.099 ASD-SCZ 0.771 (0.006) 0.035 (0.006) 0.131 (0.007) 0.064 (0.006) 106.493 5.749e-25 BPD-MDD 0.793 (0.014) 0.011 (0.026) 0.030 (0.015) 0.166 (0.027) 126.037 3.017e-29 BPD-SCZ 0.821 (0.004) 0.001 (0.005) 0.013 (0.006) 0.165 (0.007) 1851.727 0 MDD-SCZ 0.809 (0.009) 0.001 (0.010) 0.001 (0.025) 0.189 (0.025) 466.312 2.034e-103 The values in the brackets are standard errors of the estimates. The last two columns provide the LRT statistics and p-values of hypothesis testing (15). We further applied GPA to study all pairs of disorders using the CNS gene set as the annotation data. The estimated ( ) are given in Table 3 and remained almost the same as those estimated without the annotation data. The p-values of hypothesis test (14) are provided in the last column of Table 3. The p-value should be interpreted with caution: as shown in Table 1, the CNS gene set is enriched in all these disorders except ADHD. Hence, the significant p-values listed in Table 3 may be simply due to the significant enrichments for individual traits. On the other hand, the ratio between and could be more interesting. Take BPD-SCZ as an example. The ratio between and is 1.503 (s.e. 0.025), which suggests that enrichment of the CNS set for the BDP-SCZ shared risk variants was even stronger than that for BPD-only (1.467 (s.e. 0.033)) or SCZ-only (1.391 (s.e. 0.022)), although the differences are not statistically significant. 10.1371/journal.pgen.1004787.t003 Table 3 GPA results for all pairs of the five psychiatric disorders with the CNS set as annotation. p-value ADHD-ASD 0.212 (0.002) 0.425 (0.146) 0.272 (0.013) 0.022 (1.879) 4.205e-06 ADHD-BPD 0.202 (0.003) 0.975 (0.261) 0.304 (0.010) 0.216 (0.143) 2.965e-47 ADHD-MDD 0.209 (0.004) 0.490 (0.411) 0.251 (0.014) 0.001 (2.357) 0.005 ADHD-SCZ 0.204 (0.002) 0.001 (2.349) 0.261 (0.015) 0.511 (0.100) 7.132e-79 ASD-BPD 0.204 (0.003) 0.164 (0.073) 0.285 (0.015) 0.318 (0.022) 2.058e-51 ASD-MDD 0.193 (0.006) 0.470 (0.069) 0.309 (0.018) 0.002 (0.171) 3.251e-09 ASD-SCZ 0.199 (0.002) 0.295 (0.028) 0.296 (0.008) 0.255 (0.018) 2.078e-81 BPD-MDD 0.195 (0.004) 0.568 (0.025) 0.367 (0.066) 0.222 (0.023) 7.314e-48 BPD-SCZ 0.206 (0.001) 0.001 (6.907) 0.026 (0.720) 0.309 (0.006) 2.860e-130 MDD-SCZ 0.204 (0.002) 0.001 (16.882) 0.732 (1.019) 0.260 (0.011) 7.324e-78 The estimated almost remain the same as Table 2 and are shown in the table. The values in the brackets are standard errors of the estimates. The p-values of hypothesis testing (14) are provided in the last column. We also compared the results given by four different analysis approaches: single-GWAS analysis with or without annotation, and two-GWAS joint analysis with or without annotation data. The Manhattan plots are shown in Figures 6 and 7. For single-GWAS analysis without annotation, GPA identified 13 SNPs and 391 SNPs with local false discovery rate for BPD and SCZ, respectively. By using the CNS set as annotation, GPA was able to identify 14 and 409 SNPs for BPD and SCZ, respectively, with the same fdr control. For joint analysis without annotation, the number of identified SNPs increased to 383 and 821 for BPD and SCZ, respectively. By using the CNS set as annotation, the number of identified SNPs further increased to 385 and 837 for BPD and SCZ, respectively. We investigated the BPD results in detail to evaluate the power of GPA in identification of functionally important SNPs. For single-GWAS analysis of BPD, GPA was able to identify SNPs located in the ANK3 gene. By using annotation data, the CACNA1C gene, which encodes an alpha-1 subunit of a voltage-dependent calcium channel, was identified by GPA. After incorporating pleiotropy information between SCZ and BPD, additional functionally relevant genes, such as PBRM1, C6orf136, DPCR1, SYNE1, were identified by GPA. For instance, SYNE1 encodes the synaptic nuclear envelope protein 1, and codes the protein Syne-1 that is found in many tissues and is especially critical in the brain. The Syne-1 protein is active (expressed) in Purkinje cells, which are located in the cerebellum and are involved in signaling between neurons. Mutations in the SYNE1 gene have been found to cause autosomal recessive cerebellar ataxia type 1 (ARCA1) and SYNE1 has recently been implicated as a susceptibility gene for BPD in a large collaborative GWAS study [35]. Clearly, the present results indicate that the statistical power to identify associated SNPs increased by making use of pleiotropy and functional annotation (in this real data example, pleiotropy played a more important role than functional annotation). 10.1371/journal.pgen.1004787.g006 Figure 6 Manhattan plots of BPD and SCZ. Top left panel: separate analysis without annotation. Top right panel: separate analysis with CNS annotation. Bottom left panel: joint analysis without annotation. Bottom right panel: joint analysis with CNS annotation. The red and blue lines indicate local  = 0.05 and 0.1, respectively. 10.1371/journal.pgen.1004787.g007 Figure 7 Manhattan plots of local false discovery rates and (Equations (11) and (12)) for detecting BPD-SCZ-sharing SNPs. Left panel: joint analysis without annotation. Right panel: joint analysis with annotation. The red and blue lines indicate local  = 0.05 and 0.1, respectively. We also applied GPA with multiple annotation datasets to improve its performance. Beside the CNS gene set, we considered eQTL annotation from GTEx [36] and transcription factor binding site (TFBS) annotation by ANNOVAR [37]. Specifically, we downloaded the available eQTL data from the GTEx website (http://www.ncbi.nlm.nih.gov/gtex/GTEX2/gtex.cgi) and took the intersection of these eQTL with the markers of the psychiatric disorders, resulting in 23,505 SNPs annotated as eQTL. To obtain our TFBS annotation, we used the key word “TFBS” to query the ANNOVAR database, resulting in 19,029 SNPs annotated as TFBS (More details can be found at http://www.openbioinformatics.org/annovar/annovar_faq.html#tfbs). We performed joint analysis of BPD and SCZ with these three annotations (CNS, eQTL and TFBS). The estimated parameters in the GPA model and their standard errors are shown in Table 4. Notice that hit the boundary of the parameter space in presence of the eQTL annotation. This made the EM algorithm converge in fewer iterations, resulting in the minor differences between the estimated parameters in Table 4 and those in Table 2. With the local FDR , GPA identified 724 and 977 risk SNPs for BPD and SCZ, respectively. Clearly, the enrichment of eQTL is high with fold (s.e. 0.152) and the enrichment of TFBS is moderate with (s.e. 0.063). These results demonstrate that GPA is able to incorporate multiple annotation datasets for prioritization of GWAS results with good effects. 10.1371/journal.pgen.1004787.t004 Table 4 The estimated parameters and their standard errors for the joint analysis of BPD and SCZ, together with multiple annotation data: The CNS gene set, eQTL and TFBS. BPD-SCZ 00 10 01 11 0.808 (0.004) 0.025 (0.188) 0.033 (0.189) 0.134 (0.005) 0.211 (0.001) 0.001 (5.911) 0.001 (4.386) 0.359 (0.006) 0.013 ( ) 0.001 (0.125) 0.001 (0.092) 0.066 (3.23 ) 0.016 ( ) 0.001 (0.110) 0.001 (0.068) 0.021 (0.001) The first row (“00”, “10”, “01” and “11”) corresponds to the combinatorial states. Besides the evaluation of the statistical power of GPA on real data, we examined whether the Beta distribution of the GPA model fit the real data. We compared the p-values of real data and the p-values simulated from the fitted GPA model to examine the goodness of fit. The results are given in Figures S88–S97 in Text S1, suggesting that our GPA model fitted the real data well. Regarding computational time, the GPA algorithm takes less than 20 minutes to analyze two traits with one annotation data file. The speed of convergence depends on the strength of the GWAS signals. For example, it took about 7 mins and 3 mins to analyze ADHD and SCZ, repectively, as SCZ has stronger GWAS signals than ADHD. For joint analysis of BPD and SCZ, it took about 20 mins. All timings were carried out on a desktop with 3.0 GHz CPU and 16G memory. Bladder cancer GWAS and ENCODE annotation data DNase I hypersensitive sites (DHSs) are regions where DNA degradation by enzymes like DNase I occur more frequently than elsewhere. As a result, DHSs can mark active transcription regions across genome and these patterns are known to be tissue or cell specific. The ENCODE project analyzed the DHSs in 125 human cell lines with the intention of cataloging human regulatory DNA [38]. In this section, we applied GPA to assess how bladder cancer [39] risk associated SNPs are enriched in DHSs region across these 125 human cell lines. We downloaded genotype data for bladder cancer from dbGaP (NCI Cancer Genetic Markers of Susceptibility (CGEMS) project; accession number phs000346.v1.p1). We used samples genotyped from both Illumina 1 M chip and 610K chip for our analysis. For quality control, we removed SNPs with missing rates 0.01. We checked Hardy-Weinberg Equilibrium and excluded SNPs with p-value 0.001. SNPs with minor allele frequencies (MAF) were also removed. After quality control, 490,614 SNPs from 3,631 cases and 3,356 controls of European descent were used in the analysis. SNP-level association p-values were calculated for this bladder cancer data using logistic regression by assuming an additive genetic model. We also downloaded the uniform peak files for DHSs in 125 cell lines from the ENCODE project (http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeAwgDnaseUniform). Note that the DHSs for these 125 cell lines were identified with a uniform analysis workflow by the ENCODE Consortium; this facilitates fair and unbiased comparison among cell lines as annotation for our GPA model. We applied GPA to analyze the bladder cancer GWAS -values with one annotation dataset at a time, and performed hypothesis testing to assess the significance of enrichment. The results are shown in the left panel of Figure 8. Under significance level after Bonferroni correction, annotations from 19 cell lines were statistically significantly enriched for bladder cancer risk associated SNPs. Most of these cell lines were derived from lymphocytes from normal blood (e.g., T cells CD4+ Th0 adult, Monocytes CD14+ RO01746), while some cell lines came from cancer patients (e.g., Gliobla and HeLa-S3). The above results suggest that involvement of the immune system and carcinoma pathways in bladder cancer. These results also demonstrate that GPA may be an effective way to explore functional roles of GWAS hits by testing enrichment on phenotype-related annotations or user-specified annotations. 10.1371/journal.pgen.1004787.g008 Figure 8 Enrichment of the DNase I hypersenstivity site annotation data from 125 cell lines for bladder cancer. Left panel: of hypothesis testing (13) vs. fold enrichment . The vertical red line corresponds to the significance level (  = 0.05) after Bonferroni correction. The horizontal red line corresponds to ratio = 1. Right panel: The normalized variance component (2) given by LMM v.s. given by GPA. We also compared GPA with the LMM-based approach [21], [40] for this dataset. Specifically, we considered the following genome-partitioning LMM: (1) where are covariates (the first five principal components from genotype data), and and are sets of SNPs overlapping DHSs in each cell line and the remaining SNPs, respectively. We denote the numbers of SNPs in and as and , respectively. The median number of SNPs that overlap DHS in each cell line is about 60K and 90% of cell lines have the number of DHSs ranging between 40K and 80K. In order to take into account such variation in DHS number among cell lines, we define a scaled version of the proportion of phenotype variance explained by SNPs overlapping DHSs in each cell line as (2) where is the proportion of the explained variance and is the scaling factor. The right panel of Figure 8 shows that the ( )-transformed p-value of the GPA annotation enrichment test is linearly related to . This indicates that our GPA model captures enrichment of annotation almost as accurately as LMM even without the original genotype data, implying its broader applicability than methods requiring individual genotype and phenotype data. Discussion Many GWAS have been conducted in the past 10 years that have led to the discoveries of thousands of genomic regions associated with many traits, and many more discoveries are expected from ongoing GWAS. As GWAS data accumulate, there is an urgent need to perform a systematic analysis of available GWAS datasets for a comprehensive understanding of the genetic architecture of complex traits, and provide new insights for functional roles of the implicated variants. To achieve this goal, there is a great interest in developing computational and statistical approaches to exploring genomic data in the post-GWAS era. In the following, we briefly discuss the relationship between GPA and other related methods, such as LMM, conditional FDR and GSEA. LMM is an effective method for exploring genetic architecture of complex traits and it has been implemented in a popular software package, GCTA [41]. Compared with LMM, GPA has the following distinctive features: LMM explores the genetic architecture underlying complex traits/diseases by estimating the gross phenotypic variance that can be explained by whole genome or a certain subset of SNPs. In contrast, GPA provides more “fine-grained” analysis by giving estimates of the local false discovery rate of each SNP, the proportion of SNPs that are associated with the phenotype ( ), the overall effect strength of the associated SNPs ( ), and enrichment of a particular functional annotation ( ). GPA also provides the standard errors to measure the uncertainty of those estimates. Application of LMM requires the availability of genotype data, while GPA only needs the summary statistics (p-values) as its input. Typically, the genotype data may not be accessible as easily as the summary statistics. For example, when researchers want to implement integrative analysis for their own GWAS data at hand with related GWAS studies, it is much easier for them to obtain the summary statistics than the whole data sets of related GWAS studies. In this sense, GPA may greatly simplify the procedure of integrative analysis. To our best knowledge, the conditional FDR approach is the first approach that statistically addresses the issue of pleiotropy between two GWAS. and GSEA is presently the most popular approach to evaluating the enrichment of gene sets. In fact, GPA provides a unified framework for systematically integrating both sources of information, pleiotropy and annotation. Rigorous statistical inference of pleiotropic effects and annotation enrichment has been established in this framework. As demonstrated in our extensive simulation study, GPA has better performance of identifying disease-associated markers than the conditional FDR approach, and it shows greater power of evaluating annotation enrichment than GSEA as well. With real data analysis, we have demonstrated how to use GPA to incorporate pleiotropy information and multiple annotation data for prioritizing GWAS results. Here we briefly discuss some key assumptions made in GPA. GPA assumes that the p-values from null SNPs and non-null SNPs follow the uniform distribution and the Beta distribution, respectively. Although GPA performs robustly when p-values are obtained from the random-effects model, as shown via simulation study, preparation of valid p-values as GPA input is critical for its successful application. For example, if population stratification and cryptic relatedness have not been accounted for in GWAS data analysis, the obtained p-value of null SNPs will not follow the uniform distribution, resulting in inflated type I errors of GPA. Therefore, these confounding effects in GWAS data analysis should be correctly accounted for before applying GPA. Principal component analysis based approaches [42] or linear mixed models [43]–[45] can be used to address these issues effectively. We assumed independence between SNPs in our model. This independence assumption greatly simplified our model and led to very efficient computation, making GPA useful in practice. We also evaluated the LD effects on our GPA model using simulation. As expected, a risk SNP can propagate its effect to SNPs in LD with itself. In this case, the type I error of GPA depends on the definition of a “true” risk SNP. If we regard the SNPs identified in the flanking region of the risk SNP as true positives, then the type I error of GPA can be controlled at the nominal level. On the other hand, we compared performance between GPA and the commonly accepted random-effects model for enrichment analysis for bladder-cancer GWAS data, where GPA provided consistent results with the commonly accepted random-effects model. We implicitly assumed the proportion of risk SNPs should not be extremely small to enable GPA to work well. We performed simulations to investigate GPA performance when the proportion of risk SNPs was extremely small (Figures S5–S8 in Text S1). The simulation results suggest that the proportion can be poorly estimated when the true . However, the type I error of GPA can be still controlled at the nominal level even in this case. Regarding real data analysis, we have observed that can range from 0.009 to 0.196 with relatively small standard errors (Table 1). We assumed the conditional independence of multiple annotation datasets (see GPA model (9)). In the presence of multiple highly correlated annotations, the current version of GPA may not perform well. Hence, when multiple sources of annotation data are available, the correlation among the annotation datasets should be evaluated before using GPA with them. In fact, in the analysis of psychiatric disorders, the correlation among the three annotation data (CNS, eQTL and TFBS) is nearly zero, and thus we can analyze them simultaneously. If they are highly correlated, we suggest incorporating them into GPA one at a time, as shown in our analysis of bladder cancer GWAS data with ENCODE annotation. We realize that simultaneous analysis of multiple correlated annotation data may be more powerful and will investigate this issue in our future work. The parameters in the GPA model should be interpreted with caution because parameter estimation is based on the model assumption as discussed above. For example, as we showed in simulation study, the estimated can be biased due to the mismatch of GPA model and the random-effects model. There are also some limitations of the current GPA model. Although extensions to three or more GWAS are straightforward in principle (from four-groups model to -groups model, ), the number of groups will increase exponentially as the number of GWAS increases. This makes many close to zero, resulting in poor parameter estimation (large variance) and thus reduced power. Currently, we suggest doing pairwise-GWAS analysis to explore the genetic relationship of different phenotypes. It is of great interest to develop statistical models to handle the multiple-GWAS case more effectively. Another limitation is that current GPA version can only handle binary annotation. Allowing more annotation structures (e.g., continuous annotation) in GPA is an important extension of current GPA model. We will investigate this issue in the future. In summary, we have presented a statistical approach, named GPA, that can integrate information from multiple GWAS data sets and functional annotation data. Not only does GPA have better statistical power than related methods, it also provides interpretable model parameters offering insights to our understanding of the genetic architecture of complex traits. We have successfully applied GPA to analyze GWAS data of five psychiatric disorders from PGC, and showed that GPA is able to identify pleiotropic effects among psychiatric disorders and detect enrichment of the CNS gene set. We have also applied GPA to analyze a bladder cancer GWAS dataset with ENCODE data as annotation, where significant enrichments of immune system and carcinoma pathways were observed. Compared to LMM that requires individual genotype and phenotype data as input, GPA has similar results of enrichment analysis without requirements of the genotype data. This makes GPA an attractive and effective tool for the integrative analysis of multiple GWAS data with functional annotation data, when genotype data are not available. Materials and Methods GPA probabilistic model Throughout this paper, we shall use to index SNPs, to index GWAS data sets, and to index the annotation data sets. We first consider the simplest case where we only have summary statistics (p-values) from just one GWAS data set, and then extend our model to handle multiple GWAS data sets and annotation data. Suppose we have performed hypothesis testing of genome-wide SNPs and obtained their p-values: (3) where is the number of SNPs. Consider the “two-groups model” [46], i.e., the obtained p-values are assumed to come from the mixture of null and non-null, with probability and , respectively. Let be the latent variables indicating whether the j-th SNP is null or non-null, where , , and , because a SNP can only be either null or non-null. Here means un-associated (null) and means associated (non-null). Then we have the following two-groups model: (4) where the p-values from the null group are from the Uniform distribution on [0,1], denoted as , and the p-values from the non-null group are from the Beta distribution with parameters ( ), where . We put the constraint to model that a smaller p-value is more likely than a larger p-value when it is from the non-null group [47]. To incorporate information from functional annotation data, we extend the basic model as follows. Suppose we have collected information from functional annotation sources in the annotation matrix: , where indicates whether the j-th SNP is annotated in the d-th functional annotation source. For example, when there are two annotation sources – eQTL data and DNase I hypersensitivity sites (DHS) data – then is an matrix. If the j-th SNP is an eQTL, then , otherwise ; if it is located in a DHS, then , otherwise . Now we model the relationship between and as (5) Clearly, can be interpreted as the proportion of null SNPs being annotated in the d-th annotation, and corresponds to the proportion of non-null SNPs being annotated in the d-th annotation. Therefore, implies that there exists enrichment for the d-th annotation. The statistical inference about enrichment of annotation data will be discussed in details in Section “Hypothesis testing of annotation enrichment and pleiotropy”. Now we extend the above model to handle multiple GWAS data sets. To keep the notation uncluttered, we present the model for the case of two GWAS data sets. Suppose we have p-values from two GWAS: (6) Let be the matrix collecting all the p-values, where denotes the p-value of the j-th SNP in the k-th GWAS. Similarly, we introduce latent variables indicating the association between the j-th SNP and the two phenotypes: means the j-th SNP is associated with neither of them, means it is only associated with the first one, means it is only associated with the second one, and means it is associated with both. The two-groups model (4) is extended to the following “four-groups model”: (7) where . When the genetic bases of the two phenotypes are independent of each other (i.e., no pleiotropy), then we have by expectation. Therefore, the difference between and can be used to characterize pleiotropy. Statistical inference on pleiotropy is given in Section “Hypothesis testing of annotation enrichment and pleiotropy”. To incorporate annotation information into the multiple GWAS model (7), similarly, we model the relationship between and as (8) where is the probability of a null SNP being annotated, is the probability of the first phenotype associated-SNP being annotated, is the probability of the second phenotype associated-SNP being annotated, and is the probability of jointly associated-SNP being annotated. Assuming the independence of SNP markers, the joint distribution can be written as (9) where and are the j-th row of and ; the second equation holds by assuming the independence between and , conditional on ; and the third equation holds by further assuming the independence between and for , conditional on . Parameters in the GPA model can be estimated using the Expectation-Maximization (EM) algorithm [48], which is computationally efficient because we have explicit solutions for estimation of all the parameters in the M-step. Standard errors for parameter estimates can be approximated using the empirical observed information matrix [49]. Note that in the GPA model, the sample size for estimating the empirical observed information matrix corresponds to the number of SNPs and as a result, we have a very large sample size ( ) to estimate standard errors accurately. More details of the EM algorithm and the estimation of standard errors are provided in Sections 1 and 3 in Text S1. Statistical inference False discovery rate After we estimate the parameters in the GPA model, SNPs can be prioritized based on their local false discovery rates [50]. Note that separate analysis of single GWAS data based on their summary statistics is equivalent to the analysis of single GWAS data using GPA without any annotation data. Hence, separate analysis of single GWAS data can be considered as a special case of our GPA approach. To present the local false discovery rate based on our GPA model, we begin with the simplest case: single GWAS without annotation data. In this case, there are only two groups: null and non-null. The false discovery rate is defined as the probability that the j-th SNP belongs to the null group given its p-value, i.e., (10) For joint analysis of two GWAS data sets, we are interested in the local false discovery rate of the j-th SNP, if it is claimed to be associated with the first phenotype, the second one, and both, i.e., (11) Similarly, when annotation data are available, the false discovery rates can be calculated as (12) Then, we use the direct posterior probability approach [51] to control global false discovery rates. More details for the estimation of false discovery rates are provided in Section 2 in Text S1. Hypothesis testing of annotation enrichment and pleiotropy Given an annotation file, we may be interested in whether there is statistical evidence for enrichment of GWAS signals in this annotation file. We propose to use the likelihood ratio test (LRT) to assess the statistical significance. To keep the notation simple, we drop the index of annotation files. Specifically, the significance of enrichment of an annotation for single-GWAS data analysis can be assessed by the following test: (13) The LRT statistic is: where collects the parameter estimates obtained under , and the superscript A indicates the Annotation enrichment test. Note that can be easily obtained by running the GPA algorithm without incorporating the annotation data. Under the null, the test statistic asymptotically follows the distribution with 1 degree of freedom [52]. We reject if , where is the -th quantile of distribution with . For joint analysis of two GWAS with annotation data, test (13) becomes (14) Under the null, the test statistic asymptotically follows a distribution with . Similarly, the test of annotation enrichment can be extended to handle GWAS. In this case, the test statistic asymptotically follows distribution with under the null. Now we consider testing pleiotropy between two GWAS. When there is no pleiotropy, i.e., the signals from the two GWAS are independent of each other, testing pleiotropy can be formulated by testing the following hypothesis: (15) where and . The LRT statistic is constructed as follows: where represents the parameter estimates obtained under , and the superscript P indicates the Pleiotropy test. The test statistic ( ) asymptotically follows distribution with under the null. Supporting Information Text S1 Supporting information for GPA. (PDF) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests.

              p-Values from tests of significance can be combined using the Sidák correction (or the closely related Bonferroni correction) or Fisher's method, but both these methods require that the p-values combined be independent when all null hypotheses tested are true. In this paper adjustments to these methods are proposed, using a new eigenvalue-based measure of the effective number of independent tests to which the actual tests performed are equivalent, and are compared with adjustments proposed by previous authors. The adjusted methods are evaluated using a sample of 726 Alzheimer's disease (AD) cases and 707 group-matched controls, genotyped at 84,975 single-nucleotide polymorphism loci in 2,000 randomly chosen genes. The tests for genetic association with AD at loci within each gene are combined. The number of loci tested per gene varies from 2 to 994. The adjusted combined p-values agree well with the significance of the combined p-values determined empirically by random permutation of the data (Sidák correction: r=0.990; Fisher's method: r=0.994). This indicates that the combined p-values can be used to assess the relative strength of evidence for association of these genes with AD. The adjustment proposed here is a refinement of that of Nyholt ([2004] Am. J. Hum. Genet. 74:765-769), giving improved agreement with the results of random permutation. The improvement obtained is similar to that given by the refinement proposed by Li and Ji ([2005] Heredity 95:221-227). It is concluded that the concept of an effective number of tests is a valid approximation that allows p-values to be combined in a highly informative way.
                Bookmark

                Author and article information

                Journal
                Nature Genetics
                Nat Genet
                Springer Nature
                1061-4036
                1546-1718
                January 28 2019
                Article
                10.1038/s41588-018-0322-6
                30692680
                0ddaaa9c-be06-4df8-bd9b-cc2fd634da12
                © 2019

                http://www.springer.com/tdm

                History

                Comments

                Comment on this article