185
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identification of Novel Genetic Loci Associated with Thyroid Peroxidase Antibodies and Clinical Thyroid Disease

      research-article
      1 , * , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 1 , 1 , 2 , 4 , 5 , 20 , 21 , 11 , 8 , 4 , 22 , 23 , 14 , 9 , 1 , 16 , 24 , 13 , 18 , 5 , 2 , 3 , 25 , 4 , 26 , 26 , 11 , 27 ,   13 , 18 , 28 , 29 , 4 , 11 , 30 , 31 , 32 , 33 , 2 , 3 , 34 , 35 , 30 , 36 , 2 , 37 , 38 , 39 , 40 , 32 , 41 , 42 , 30 , 43 , 44 , 45 , 42 , 46 , 47 , 48 , 30 , 43 , 49 , 50 , 17 , 51 , 4 , 52 , 18 , 53 , 17 , 54 , 55 , 14 , 56 , 22 , 57 , 58 , 59 , 60 , 36 , 61 , 62 , 10 , 7 , 63 , 51 , 19 , 64 , 65 , 24 ,   66 , 67 , 19 , 18 , 10 , 13 , 68 , 15 , 9 , 14 , 12 , 10 , 11 , 1 , 19 , 69 , 1 , 19 , 69 , 6 , 32 , 28 , 70 , 1 , 52 , 6 , 32 , 52 , 51 , 71 , 4 , 72 , 2 , 2 , 1
      PLoS Genetics
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Autoimmune thyroid diseases (AITD) are common, affecting 2-5% of the general population. Individuals with positive thyroid peroxidase antibodies (TPOAbs) have an increased risk of autoimmune hypothyroidism (Hashimoto's thyroiditis), as well as autoimmune hyperthyroidism (Graves' disease). As the possible causative genes of TPOAbs and AITD remain largely unknown, we performed GWAS meta-analyses in 18,297 individuals for TPOAb-positivity (1769 TPOAb-positives and 16,528 TPOAb-negatives) and in 12,353 individuals for TPOAb serum levels, with replication in 8,990 individuals. Significant associations ( P<5×10 −8) were detected at TPO-rs11675434, ATXN2-rs653178, and BACH2-rs10944479 for TPOAb-positivity, and at TPO-rs11675434, MAGI3-rs1230666, and KALRN-rs2010099 for TPOAb levels. Individual and combined effects (genetic risk scores) of these variants on (subclinical) hypo- and hyperthyroidism, goiter and thyroid cancer were studied. Individuals with a high genetic risk score had, besides an increased risk of TPOAb-positivity (OR: 2.18, 95% CI 1.68–2.81, P = 8.1×10 −8), a higher risk of increased thyroid-stimulating hormone levels (OR: 1.51, 95% CI 1.26–1.82, P = 2.9×10 −6), as well as a decreased risk of goiter (OR: 0.77, 95% CI 0.66–0.89, P = 6.5×10 −4). The MAGI3 and BACH2 variants were associated with an increased risk of hyperthyroidism, which was replicated in an independent cohort of patients with Graves' disease (OR: 1.37, 95% CI 1.22–1.54, P = 1.2×10 −7 and OR: 1.25, 95% CI 1.12–1.39, P = 6.2×10 −5). The MAGI3 variant was also associated with an increased risk of hypothyroidism (OR: 1.57, 95% CI 1.18–2.10, P = 1.9×10 −3). This first GWAS meta-analysis for TPOAbs identified five newly associated loci, three of which were also associated with clinical thyroid disease. With these markers we identified a large subgroup in the general population with a substantially increased risk of TPOAbs. The results provide insight into why individuals with thyroid autoimmunity do or do not eventually develop thyroid disease, and these markers may therefore predict which TPOAb-positives are particularly at risk of developing clinical thyroid dysfunction.

          Author Summary

          Individuals with thyroid peroxidase antibodies (TPOAbs) have an increased risk of autoimmune thyroid diseases (AITD), which are common in the general population and associated with increased cardiovascular, metabolic and psychiatric morbidity and mortality. As the causative genes of TPOAbs and AITD remain largely unknown, we performed a genome-wide scan for TPOAbs in 18,297 individuals, with replication in 8,990 individuals. Significant associations were detected with variants at TPO, ATXN2, BACH2, MAGI3, and KALRN. Individuals carrying multiple risk variants also had a higher risk of increased thyroid-stimulating hormone levels (including subclinical and overt hypothyroidism), and a decreased risk of goiter. The MAGI3 and BACH2 variants were associated with an increased risk of hyperthyroidism, and the MAGI3 variant was also associated with an increased risk of hypothyroidism. This first genome-wide scan for TPOAbs identified five newly associated loci, three of which were also associated with clinical thyroid disease. With these markers we identified a large subgroup in the general population with a substantially increased risk of TPOAbs. These results provide insight into why individuals with thyroid autoimmunity do or do not eventually develop thyroid disease, and these markers may therefore predict which individuals are particularly at risk of developing clinical thyroid dysfunction.

          Related collections

          Most cited references55

          • Record: found
          • Abstract: not found
          • Article: not found

          Thyroiditis.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Heritability of Cardiovascular and Personality Traits in 6,148 Sardinians

            Introduction Complex traits, including aging-associated conditions, can be influenced by a multiplicity of genetic and environmental factors. Because each factor is expected to make only a small contribution to trait variability, and this contribution may itself be influenced by interactions with other susceptibility factors, identifying the genetic basis of complex traits is challenging and requires large sample sizes [1]. Isolated founder populations, which have already proven useful in the study of many Mendelian disorders [2], provide an attractive setting for the study of complex traits [3,4] because they typically exhibit greater genetic and environmental homogeneity than more cosmopolitan populations. Sardinia is the second largest island in the Mediterranean. Its modern population numbers approximately 1.65 million and constitutes a genetically isolated founder population [5–7], which has already aided in the identification of genes involved in several Mendelian disorders [8–12]. In addition to its status as an isolated founder population and its relatively large size, the Sardinian population is attractive for genetic studies due to its organization into long-established settlements [13]. Here, we use a large cohort of 6,148 Sardinians to study the heritability of a spectrum of 98 quantitative traits. Studying broad groups of traits, we could assess the generality of any trends, such as changes in heritability with aging. To increase the potential clinical utility of the results, we focused on traits that affect major domains of clinical interest. For example, in addition to anthropometric features, we quantified levels of plasma and serum markers, including total cholesterol, high-density lipoprotein (HDL), and low-density lipoprotein (LDL) levels, and measured subclinical vascular alterations [14–18] that are of intrinsic interest and are also useful predictors of cardiovascular disease [19]. Similarly, we assessed individual differences in personality using the five-factor model [20,21], which quantifies recurring dimensions of personality. Again, in addition to their intrinsic interest, these personality traits are important in understanding a variety of important life outcomes, including mental disorders. Our study uses the full range of phenotypic variation in the population to dissect the genetic contribution and provide a quantitative assessment of the impact of inherited variation on each trait. In addition, we report evidence for heterogeneity in the genetic and environmental contributions to variation, by comparing variances and covariances between males and females and between the younger and older individuals in our cohort. Finally, we examine evidence for an overlap in the genetic determinants of multiple traits, identifying clusters of traits that appear to be influenced by the same genes. The joint study of cardiovascular and personality traits afforded us an opportunity to look for a genetic factor that might contribute to the association of certain personality traits and cardiovascular problems [22]. Overall, our results should be useful to investigators interested in identifying the genetic determinants of quantitative trait variation, especially for clinically relevant quantitative traits affecting cardiovascular function and personality. Results Cohort Recruitment We recruited and phenotyped 6,148 individuals, male and female, age 14 y and above (Figure 1A) from a cluster of four towns in the Lanusei Valley in the Ogliastra region of the Sardinian province of Nuoro. This corresponds to approximately 62% of the population eligible for recruitment in the area, which totaled 9,841 individuals in the 2001 census. Compared to the census population, our sample is enriched for females at all ages (3,523 individuals, or 57%, of our sample, compared to 5,089, or 52%, of the census population). Ascertainment was less complete for individuals more than 74 y of age, among whom only approximately 29% of the population was recruited (238 individuals more than 74 y recruited, but 813 were reported in the 2001 census). Figure 1 Age, Sex, and Birthplace Distribution for Participants (A) Shows the number of recruited females (black bars) and males (white bars) from the four clustered towns. (B) Shows the birthplace distribution of participants, in progressively larger geographic units: Lanusei, L.I.E.A. (Lanusei and the three surrounding towns of Ilbono, Elini, and Arzana), the Lanusei valley, the region of Ogliastra, the province of Nuoro, and all of Sardinia. (C) Shows the birthplace distribution for grandparents of participants in the same progressively larger geographic units. Nearly all subjects were born in Sardinia (5,857 [95%]) and, specifically, in the Ogliastra region (5,442 [89%]; Figure 1B shows the birth places of participants in the restricted geographical region). Emphasizing the stability of the population, all grandparents were born in Sardinia for 95% of participants (Figure 1C). The cohort is organized into multiple complex pedigrees. Information collected at recruitment allowed us to organize 5,610 individuals into 711 connected pedigrees, each up to five generations deep. The largest pedigree connects 625 phenotyped individuals. In total the sample includes 34,469 relative pairs, with an average kinship coefficient of 0.1628. These relative pairs include 4,933 sibling pairs, 180 half-sibling pairs, 4,014 first cousins, 4,256 parent–child pairs, 675 grandparent–grandchild pairs, and 6,400 avuncular pairs in addition to other more distant relatives. Our sample also includes 11 monozygotic twins (identified by genotyping approximately 10,000 single nucleotide polymorphisms in all individuals). Because monozygotic twins are often more similar to each other than predicted by a simple genetic model (even with genetic dominance included), we included only one individual from each of these twin pairs in the analysis reported below. Summary of Quantitative Trait Variation To examine the effect of age and sex on each trait, we first generated and reviewed summary plots for each trait. The complete set of plots is available online (http://www.sph.umich.edu/csg/chen/public/sardinia) together with detailed results for all our analysis. Figure 2 displays the distribution of six illustrative traits for males and females. It is clear that for many traits there are marked differences between the sexes, affecting not only trait means, but also the overall pattern of variability around these means. Figure 3 illustrates the effect of age on the same six traits. For each trait, observed measurements are plotted against age at enrollment, and two quadratic regression lines (blue for females and red for males) are presented to summarize the impact of age on the traits. These plots allowed us to identify outliers in each trait and to compare trait distributions with other studies. Figure 2 Distribution of Six Illustrative Traits in Male and Female Participants Relative densities are plotted for males (solid lines) and females (dashed lines) for two serum values (cholesterol levels [A] and HDL [B]), two measures of cardiovascular function (IMT of the carotid artery [C] and PWV [D]), and two personality facets (NEO_N3 [E] and NEO_O5 [F]). A complete set of plots, including all traits, is available online (http://www.sph.umich.edu/csg/chen/public/sardinia). Figure 3 Illustrative Quantitative Traits Plotted as a Function of Age These are the same traits as in Figure 2. All values are plotted, and polynomial regression curves fitted to the data show inferred trends for males (solid red lines) and females (dashed blue lines) with increasing age. A complete set of plots, allowing for all traits, is available online (http://www.sph.umich.edu/csg/chen/public/sardinia). We next calculated the mean and standard deviation for all traits, both in the entire cohort and after stratifying the sample by sex and age. When stratifying the sample by age, we considered four age bands (14–29, 30–44, 45–59, and 60–102 y of age), each including approximately 25% of sampled individuals. The results are summarized in Table S1, with traits organized as blood test results (38 traits), anthropometric measures (five traits), cardiovascular measures (20 traits), and personality traits (five factors and 30 facets of personality). Nearly all traits showed highly significant evidence (analysis of variance p 0.05, indicating no significant degradation in fit when using the parsimonious models). Thus, there was clear evidence for heterogeneity in variance components by sex, but it was difficult to decide whether the heterogeneity was due to genes, environment, or both. Heterogeneity in Variance Components, by Age To look for heterogeneity in variance components by age, we divided individuals into two groups. The “younger” group included individuals less than 42 y of age (the median age in our sample), whereas the “older” group included individuals 42 y of age and older. We found significant evidence for heterogeneity in variance components by age in 62 of the 98 traits examined (the results are summarized in Table 4). This included a majority of traits in all categories, including anthropometric traits (three of five), blood test results (24 of 38), cardiovascular traits (13 of 20), and personality factors and facets (22 of 35). Again, we considered a series of intermediate models, including only heterogeneity in environmental or genetic variance components, or in which variance components differed by a constant factor between the young and old, and used the BIC to select the best-fitting model. For 26 traits, a model in which only the environmental variance differed between young and old was selected, and for 20 of these traits, environmental variance was greater among older individuals (so that heritability was lower). Heritability was higher in older individuals for IMT and five personality traits. Table 4 Model Comparisons between Young and Old For 21 traits, a model in which only genetic variance differed between the young and old was selected, and heritability was higher in the young for 15 traits (12 personality traits and three blood test results). It is noteworthy that the six traits more heritable in the old included several blood pressure–related traits (SBP, DBP, mean blood pressure, and pulse pressure). For these cardiovascular traits, heritability increased an average of 18% among older individuals, from approximately 8% for younger individuals to approximately 26% in older individuals. For 15 other traits, a model in which heritabilities between the young and old differed by a constant factor provided the best fit to the data, whereas for one trait (fractionated bilirubin), both environmental and genetic variance components appeared to differ between the young and old. Bivariate Analysis We calculated genetic correlation coefficients for all pairings of 93 traits (including the 38 blood phenotypes, five anthropometric measures, 20 cardiovascular traits, and 30 facets of personality, but excluding the five factors of personality, which are derived from the 30 facets). This corresponds to a total of 8,556 genetic correlation coefficients, of which 118 coefficients were greater than 0.50. In contrast, only 36 of the overall correlation coefficients were greater than 0.50. A full matrix of pairwise correlation coefficients is available http://www.sph.umich.edu/csg/chen/public/sardinia). We identified 18 clusters of traits with a genetic correlation greater than 0.50 (Table S2). To summarize the full pairwise correlation matrix, we used a hierarchical clustering approach that successively groups traits with large genetic correlations (see Figure 4). In the figure, traits connected by short branches share more of their genetic correlation, whereas traits that join up only near the root of the tree have only a small genetic correlation. Some of the clusters occur because traits are related by definition (for example, pulse pressure and SBP), or by physiology (for example, diastolic diameter [diam_D] and systolic diameter [diam_S], and IMT and wall lumen). Other clusters are quite interesting. For example, hip circumference, waist circumference, body mass index (BMI), and weight all cluster close together and near insulin levels. These traits are all related to the metabolic syndrome [27], and the result supports a genetic underpinning for the syndrome. As another example, the clustering of facets for the NEO O, NEO N, NEO C, and NEO A factors reinforces the structure of the five-factor personality model. Other results are more unexpected. For example, the personality facet NEO E4 (activity) clusters closer to components of NEO C (conscientiousness) than it does to other facets of NEO E. To further investigate the genetic relationship between different personality facets, we also carried out a factor analysis of genetic correlations (Table S3). This factor analysis confirms that the genetic structure of personality replicates its phenotypic structure quite well, but again places NEO E4 closer to components of NEO C. Figure 4 Clustering of Genetic Correlations The 98 quantative traits are classified into clusters inferred from genetic correlations between any two traits, with an “average” distance measure used in the clustering algorithm. Classes of traits are color-coded as personality (red), serum composition (blue), cardiovascular (black), and anthropometric (green). Overlap of the apparent genetic contribution to variance is indicated on the ordinate, with larger overlaps towards the bottom. Eighteen values exceed 50% overlap (see text). We looked specifically for a genetic link between personality traits and cardiovascular disease [22]. Hostility, depression, anger, and anxiety have been associated with cardiovascular risk factors, including arterial stiffness and thickness (see [28] and references therein), and are independent predictors of incident cardiovascular disease and mortality [29]. Several mechanistic links have been proposed to explain the relationship between personality traits and cardiovascular diseases and outcomes [30]. However, the basis for the association has been conjectural. We find no substantive sharing of a genetic basis for cardiovascular traits and any psychological traits. For example, genetic correlation between N2 (hostility and anger) or A4 (low compliance/aggression) and IMT, PWV, SBP, DBP, or heart rate was not significantly different from zero. Discussion The cohort of Sardinians described here provided us with a valuable opportunity to investigate the heritability of multiple traits simultaneously. For some traits, the size of our cohort exceeds the total number of individuals examined in all previously published studies of their heritability. The large size of the cohort and the diversity of the relationships sampled enabled us not only to consider the overall heritability of each trait, but also to investigate the possibility of heterogeneity in genetic effects by age or sex, as well as the evidence for shared genetic determinants between different traits. To facilitate downstream studies, complete results of all our analyses (including likelihoods and parameter estimates for each model fitted) are available online (http://www.sph.umich.edu/csg/chen/public/sardinia). Overall, we estimated heritabilities of approximately 0.40 on average for individual blood test results, approximately 0.51 for anthropometric measures, approximately 0.25 for measures of cardiovascular function, and approximately 0.19 for personality factors and facets. In general, our results appear to be consistent with previous studies (see, for example, [31–34]), and particularly with previous studies based on extended pedigrees, (e.g., in the Hutterites [35] and another Sardinian village [36]). Our estimates of heritability are smaller than in previous studies of twins and siblings, both for cardiovascular traits [37,38] and for personality traits [39–43]. Extended pedigree samples such as ours allow specific assessment of narrow heritability potentially, and it is possible that non-additive effects inflated estimates of heritability in studies of twins and small families [44,45]. In our cohort, four of five components of the five-factor model (NEO N, E, O, and C) and most cardiovascular traits showed evidence for genetic dominance. Our broad estimates of heritability, which allow for genetic dominance, are more similar to results in studies of twins and siblings. Nearly all traits showed highly significant evidence (p 0, σs 2 = 0), models with only shared environment (σd 2 = 0, σs 2 > 0), and other intermediate models (σd 2 > 0, σs 2 > 0), comparisons of parameter estimates from these models are informative. In the model with genetic dominance, the quantity H2 = (σd 2 + σg 2)/(σd 2 + σg 2 + σe 2) provides a liberal estimate of the overall impact of genes on the phenotype at hand, whereas in the model attributing any excess similarity between siblings to shared environment, the quantity h2 = σg 2/(σs 2 + σg 2 + σe 2) provides a very conservative estimate of the overall impact of genes. Whenever there was significant evidence (p j implies that i is not an ancestor of j (any ordering where ancestors precede their descendants is suitable). Then, we defined the kinship coefficient for X-linked genes, ϕij (X) , as follows: Although this definition only covers the situation in which i ≥ j, it can be used to estimate any kinship coefficient because ϕij (X) = ϕji (X) . The definition reflects the fact that males carry only one allele for X-linked genes, inherited from their mother. Females carry two alleles, one inherited from each parent. The functions mother(i) and father(i) return indexes for the parents of i. Supporting Information Protocol S1 Supplementary Methodology: Protocol Details for Measuring Cardiovascular Traits This section provides a detailed protocol for the assessment of cardiovascular traits. (18 KB PDF) Click here for additional data file. Table S1 Detailed Descriptive Statistics for 98 Traits This table includes trait means and variances. Trait means are stratified by sex and into four age bands. (37 KB PDF) Click here for additional data file. Table S2 Clusters of Traits for Which Genetic Correlation Is More Than 0.5 Highlights subsets of traits identified in the clustering analysis, for which the genetic correlation exceeds 0.5. (7 KB PDF) Click here for additional data file. Table S3 Genetic Factor Structure of Personality Traits The table presents Procrustes-rotated principal components from the genetic correlations among the 30 facets of the NEO-PI-R, targeted to the American normative factor structure. (11 KB PDF) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions

              Introduction An emerging challenge in genomics is the ability to examine multiple disease regions within the human genome, and to recognize a subset of key genes that are involved in a common cellular process or pathway. This is a key task to translate experimentally ascertained disease regions into meaningful understanding about pathogenesis. The importance of this challenge has been highlighted by advances in human genetics that are facilitating the rapid discovery of disease regions in the form of genomic regions around associated SNPs (single nucleotide polymorphisms) [1]–[6] or CNVs (copy number variants) [7]–[10]. These disease regions often overlap multiple genes – though only one is typically relevant to pathogenesis and the remaining are spuriously implicated by proximity. The difficulty of this task is heightened by the limited state of cataloged interactions, pathways, and functions for the vast majority of genes. However, undefined gene relationships might often be conjectured from the literature, even if they are not explicitly described yet. The general strategy of using function to prioritize genes in disease regions has been substantially explored [11]–[18]. However, predicted disease genes have not, in general, been easily validated. Thus far, published approaches have utilized a range of codified gene information including protein-interaction maps, gene expression data, carefully constructed gene networks based on multiple information sources, predefined gene sets and pathways, and disease-related keywords. We propose, instead, to use a flexible metric of gene relatedness that not only captures clearly established close gene relationships, but also has the ability to capture potential undocumented or distant ones. Such a metric may be a more powerful tool to approach this problem rather then relying on incomplete databases of gene functions, interactions, or relationships. To this end, we use established statistical text mining approaches to quantify relatedness between two genes – specifically, gene relatedness is the degree of similarity in the text describing them within article abstracts. The published literature represented in online PubMed abstracts encapsulates years of research on biological mechanisms. We and others have shown the great utility of statistical text mining to rapidly obtain functional information about genes, including protein-protein interactions, gene function annotation, and measuring gene-gene similarity [19]–[22]. Text is an abundant and underutilized resource in human genetics, and currently a total of 140,000 abstracts from articles that reference human genes are available through PubMed [23]. Additional valuable information can be seamlessly gained by including more than 100,000 references from orthologous genes; many important pathways have been more thoroughly explored in model systems than in humans. We have developed a novel statistical method to evaluate the degree of relatedness among genes within disease regions: Gene Relationships Among Implicated Loci (GRAIL). Given only a collection of disease regions, GRAIL uses our text-based definition of relatedness (or alternative metrics of relatedness) to identify a subset of genes, more highly related than by chance; it also assigns a select set of keywords that suggest putative biological pathways. It uses no information about the phenotype, such as known pathways or genes, and is therefore not tethered to potentially biased pre-existing concepts about the disease. In addition to a flexible text-based metric of relatedness, GRAIL's ability to successfully connect genes also leverages a statistical framework that carefully accounts for differential gene content across regions. We assume that each region contains a single pathogenic gene; therefore narrow regions with one or just a few genes are more informative than expansive regions with many genes, since they are likely to have many irrelevant genes. To take advantage of this, we have designed GRAIL to set a lower threshold in considering relatedness for those genes in narrow regions, allowing for more distant relationships to be considered; on the other hand it sets a more stringent threshold for genes located in expansive mutligenic regions and considers only the very closest of relationships. This strategy prevents large regions with many genes from dominating the analysis. In this paper we apply GRAIL to four phenotypes. In each case GRAIL is able to identify a subsets of genes enriched for relatedness – more than expected by random chance. We demonstrate enrichment for relatedness among true disease regions rigorously based on both GRAIL's theoretically derived p-value and also based on parallel analysis of either (1) carefully selected random regions matched for gene content and size or (2) experimentally derived false positive disease regions. GRAIL is able to identify subsets of highly related genes among validated SNP associations. First we use GRAIL to identify related genes from SNPs associated with serum lipid levels; GRAIL correctly identifies genes already known to influence lipid levels within the cholesterol biosynthesis pathway. In comparison to randomly selected matched SNP sets, the set of lipid SNPs demonstrate significantly more relatedness. Second, we use GRAIL to identify significantly related genes near height-associated SNPs; these genes highlight plausible pathways involved in height. In comparison to randomly selected matched SNP sets, the set of height SNPs also demonstrate significantly more relatedness. Encouraged by GRAIL's ability to recognize biologically meaningful connections, we tested its ability to distinguish true disease regions from false positive regions in two practical applications in human genetics. First, in Crohn's disease, we start with a long list of putative SNP associations from a recent GWA (genome-wide association) meta-analysis [24]. We demonstrate that a substantial fraction of these SNPs contain highly related genes—far beyond what can be expected by chance. We demonstrate that many of these SNPs subsequently validate in an independent replication genotyping experiment. Second, in schizophrenia, we previously identified an over-representation of rare deletions in schizophrenia cases compared to controls [8]. Despite the statistical excess, it is challenging to identify exactly which case deletions are causal, given the relatively high background rate of rare deletions in controls. Using GRAIL however, we are able to demonstrate that a subset of case deletions contain related genes. We further demonstrate that these genes are highly and significantly enriched for central nervous system (CNS) expressed genes. In stark contrast, GRAIL finds no excess relatedness among genes implicated by case deletions. Results Summary of statistical approach GRAIL relies on two key methods: (1) a novel statistical framework that assesses the significance of relatedness between genes in disease regions (2) a text-based similarity measure that scores two genes for relatedness to each other based on text in PubMed abstracts. Details for both are presented in the Methods. The GRAIL statistical framework consists of four steps (see Figure 1). First, given a set of disease regions we identify the genes overlapping them (Figure 1A); for SNPs we use LD (linkage disequilibrium) characteristics to define the region. Second, for each overlapping gene we score all other human genes by their relatedness to it (Figure 1B). In this paper we use a text-based similarity measure; alternative measures of relatedness, for example similarity in gene annotations or expression data, could be easily applied instead [25],[26]. Third, for each gene we count the number of independent regions with at least one highly related gene (Figure 1C); here the threshold for relatedness varies between regions depending on the number of genes within them. We assign a p-value to that count. Fourth, for each disease region we select the single most connected gene as the key gene. We assign the disease region that key gene's p-value after adjusting for multiple hypothesis testing (if there are multiple genes within the region) (Figure 1D). This final score is listed in this paper as pmetric where the metric is text, expression, or annotation based. Very low ptext scores for one region indicate that a gene within it is more related to genes in other disease regions through PubMed abstracts than expected by chance. Simulations on random groups of SNPs demonstrate that the ptext values approximately estimate Type I error rates, being approximately uniformly distributed under the null hypothesis (see Figure S1). However, we recommend the use of careful simulations or controls rather than actual theoretical p-values to reinforce the significance of GRAIL's findings – as we do in the examples below. 10.1371/journal.pgen.1000534.g001 Figure 1 Gene Relationships Among Implicated Loci (GRAIL) method consists of four steps. (A) Identifying genes in disease regions. For each independent associated SNP or CNV from a GWA study, GRAIL defines a disease region; then GRAIL identifies genes overlapping the region. In this region there are three genes. We use gene 1 (pink arrow) as an example. (B) Assess relatedness to other human genes. GRAIL scores each gene contained in a disease region for relatedness to all other human genes. GRAIL determines gene relatedness by looking at words in gene references; related genes are defined as those whose abstract references use similar words. Here gene 1 has word counts that are highly similar to gene A but not to gene B. All human genes are ranked according to text-based similarity (green bar), and the most similar genes are considered related. (C) Counting regions with similar genes. For each gene in a disease region, GRAIL assesses whether other independent disease regions contain highly significant genes. GRAIL assigns a significance score to the count. In this illustration gene 1 is similar to genes in three of the regions (green arrows), including gene A. (D) Assigning a significance score to a disease region. After all of the genes within a region are scored, GRAIL identifies the most significant gene as the likely candidate. GRAIL corrects its significance score for multiple hypothesis testing (by adjusting for the number of genes in the region), to assign a significance score to the region. The text-based similarity metric is based on standard approaches used in statistical text mining. To avoid publications that report on or are influenced by disease regions discovered in the recent scans, we use only those PubMed abstracts published prior to December 2006, before the recent onslaught of GWA papers identifying novel associations. This approach effectively avoids the evaluation of gene relationships being confounded by papers listing genes in regions discovered as associated to these phenotypes. In addition to including primary abstract references about genes listed in Entrez Gene, we augment our text compendium with references to orthologous genes listed in Homologene [23]; this increases the number of articles available per gene from 6 to 12 (see Table 1). We note that the distribution of articles per gene is skewed toward a small number of genes with many references; 0.4% of genes are referenced by >500 articles, while 26% of genes are referenced by 0.1. The scatter plot on the right illustrates ptext values for actual serum cholesterol associated SNPs (blue dots). Black horizontal line marks the median ptext value. We assessed the same SNP with similarity metrics based on gene annotation (green dots) and gene expression correlation (purple dots). (B) 42 SNPs associated with height. Similar plot for 42 height associated SNPs. The histogram on the left of the graph illustrates ptext values for random SNP sets carefully matched to height-associated SNP set. 86.5% of those SNPs have ptext values that are >0.1. The scatter plot on the right illustrates ptext values for actual SNPs associated with height (blue dots). Black horizontal line marks the median ptext value. We assessed the same SNP with similarity metrics based on gene annotation (green dots) and gene expression correlation (purple dots). On the right we list for each ptext threshold the number of expected SNPs less than the threshold based on matched sets, and the number of observed SNPs less than the threshold among height associated SNPs. Despite relatively comprehensive lipid biology annotation, GO does not identify relationships between regions as effectively as published text (Figure 2A). A total of 12 out of the 19 associated SNPs obtained pannotation 10−4); the remaining 22 regions had intermediate levels of significance following replication (and can be considered as yet unresolved associations) [24]. We applied GRAIL prospectively to these 74 nominally associated SNPs. GRAIL was initially operated independent of any knowledge of the contemporaneous replication genotyping experiment. Each region contained between 1 and 34 genes, except for two regions that contained no genes and were not scored. GRAIL identified 13 regions as significant (achieving ptext scores 0.1. 10.1371/journal.pgen.1000534.t002 Table 2 High scoring regions from a Crohn's disease GWA meta-analysis. SNP Chr Position (HG17) passociation Replication Study Result N (genes) Implicated Gene p text rs2066845 16 49314041 1.5E-24 VALIDATED 3 NOD2 0.00010 rs10863202 16 84545499 1.4E-05 INDETERMINATE 4 IRF8 0.00058 rs10045431 5 158747111 1.9E-13 VALIDATED-NOVEL 1 IL12B 0.00066 rs11465804 1 67414547 3.3E-63 VALIDATED 1 IL23R 0.00094 rs2476601 1 114089610 7.3E-09 VALIDATED-NOVEL 8 PTPN22 0.0014 rs762421 21 44439989 7.0E-10 VALIDATED-NOVEL 1 ICOSLG 0.0023 rs2188962 5 131798704 1.2E-18 VALIDATED 9 IRF1 0.0026 rs917997 2 102529086 1.1E-05 INDETERMINATE 5 IL18RAP 0.0027 rs11747270 5 150239060 1.7E-16 VALIDATED 3 IRGM 0.0032 rs2738758 20 61820069 2.7E-06 INDETERMINATE 10 TNFRSF6B 0.0038 rs9286879 1 169593891 7.7E-10 VALIDATED-NOVEL 4 TNFSF18 0.0042 rs2301436 6 167408399 5.2E-13 VALIDATED-NOVEL 3 CCR6 0.0052 rs4263839 9 114645994 1.3E-10 VALIDATED 2 TNFSF8 0.008 rs3828309 2 233962410 1.2E-32 VALIDATED 4 USP40 0.019 rs744166 17 37767727 3.4E-12 VALIDATED-NOVEL 2 STAT3 0.023 rs7758080 6 149618772 4.4E-06 INDETERMINATE 4 SUMO4 0.033 rs7161377 14 75071147 2.3E-05 INDETERMINATE 1 BATF 0.09 Here we list a subset of the 74 regions that emerged from a Crohn's disease GWA meta-analysis that GRAIL assigned the most compelling ptext scores to. The first three columns list information about the associated SNP. The fourth column lists the combined p-value of association from a GWA meta-analysis and subsequent replication. The fifth column indicates whether the region was validated, indeterminate, or failed in replication. Those regions that represent novel findings, not previously published are also indicated. The sixth column lists the number of genes in the disease region, and the seventh column lists the candidate gene identified by GRAIL. The eighth column lists the regions ptext score. Using these Crohn's results, we have compared GRAIL's performance to four other competing algorithms that also use functional information to prioritize genes, and GRAIL's performance is superior at predicting true positive associations (see Text S1, Figure S2, Table S5, Table S6). As a further test of GRAIL, we then evaluated the next most significant 74 associated SNPs that emerged from the Crohn's disease GWA meta-analysis (association p-values ranging from 5×10−5 to 2×10−4). Out of the 75 regions, 8 are not near any gene, and we did not score them. The remaining 67 regions were tested with GRAIL for relationships to the 52 replicated and indeterminate regions that emerged following replication. Two emerge with highly significant GRAIL scores: rs8178556 on chromosome 21 (IFNAR1, ptext  = 1.7×10−4) and rs12928822 on chromosome 16 (SOCS1, ptext  = 8.2×10−4) suggesting these independent regions may lead to novel associated SNPs for Crohn's disease (see Table S7). We next applied GRAIL to recently published sets of rare deletions seen in schizophrenia cases and matched controls. Multiple groups have recently demonstrated that extremely rare deletions, many of which are likely de novo, are notably enriched in schizophrenia [8]–[10],[29]. However, since rare deletions occur frequently in healthy individuals as well, many of these case deletions will also be non-pathogenic. In fact, we previously found that large (>100 kb), gene overlapping, singleton, deletions were present in 4.9% of cases but also in 3.8% of controls, suggesting that over two-thirds of these deletions are not relevant to disease [8]. We identified 165 published de-novo or case-only deletions of >100 kb overlapping at least one gene; a total of 511 genes are deleted or disrupted by these deletions [8],[9],[10]. Additionally, we identified 122 regions similar control-only deletions; a total of 252 genes are deleted or disrupted by these deletions. We applied GRAIL separately to both the case and control sets of deletions. In the case deletions, we identified a subset containing highly connected genes (Figure 4A). Specifically, 12 of the 165 regions obtain ptext scores 0.5 (Figure 3B). These regions might have been missed since the relevant gene is either poorly studied, or even if the gene is well studied, the relevant function of that gene is not well documented in the text. An alternative possibility is that the SNP is tagging non-genic regulatory elements. Additionally, the SNP may be the first discovered representative association for a critical pathway, not represented by other SNP associations – and therefore cannot be connected to them. In this case future discoveries will clarify the significance of that association. In cases where there is no apparent published connection between associated genes, other similarity metrics based on experimentally derived data, such as gene expression, protein-protein interactions and transcription factor binding sites could also complement the text-based approaches presented here. In fact, we demonstrate how annotation-based metrics or gene expression-based metrics are able to identify a subset of the associated SNPs in lipid metabolism. As these and other metrics are optimized, they could be used in conjunction with the novel GRAIL statistical framework that we present here to help understand gene relationships. Methods Scoring regions for functional relatedness The Gene Relationships Among Implicated Loci (GRAIL) has four basic steps that are outlined below. It has two input sets of disease regions: (1) a collection of NSEED seed regions (SNPs or CNVs) and (2) a collection of NQUERY query regions. Genes in query regions are evaluated for relationships to genes in seed regions, and query regions are then assigned a significance score. In most applications we are examining a set of regions for relationships between implicated genes, the query regions and the seed regions are identical. In other circumstances where we have a set of putative regions that are being tested against validated ones, the putative regions are defined as query regions, and the validated ones are defined as seed regions. Step 1. Defining disease regions and identifying overlapping genes For each query and seed SNP we find the furthest neighboring SNPs in the 3′ and 5′ direction in LD (r2>0.5, CEU HapMap [50]). We then proceed outwards in each direction to the nearest recombination hotspot [51]. The interval between those two hotspots, which would include the SNP of interest and all SNPs in LD, is defined as the disease region. The associated SNP could feasibly be tagging a stronger SNP signal from another SNP in that region. All genes that overlap that interval are considered implicated by the SNP. If there are no genes in that region, the interval is extended an additional 250 kb in either direction; we chose 250 kb as that distance since that is a range in which non-coding variants might express gene regulation [52]. For each query and seed CNV we define an interval that represents the deleted or duplicated region—all genes that overlap that interval are associated with the CNV for testing. Step 2. Ranking gene relatedness For each gene near a query region, we rank all human genes for relatedness. Ranking may be based on text similarity, or other metrics (see below for examples). Rank values range from 1 (most related) to NG (least related), where NG is the number of available human genes, in our application is 18,875 (see Table 1). Step 3. Scoring candidate genes against regions To avoid double counting nearby regions, we first combine any seed regions sharing one or more genes. For a given gene g in a query region, we examine the degree of similarity to any of the ns genes in a given seed region s. To ensure independence, we only look at a seed region s, if it does not share a single gene with the query region that gene g is contained in. We identify in each region s, the rank of the most similar (or lowest ranking) gene in it to gene g, Rg,s . We convert the rank to a proportion: To transform this proportion to a uniformly distributed entity under the null, we recognize that Rg,s was the lowest rank selected from ns genes – and we correct accordingly for multiple hypothesis testing: Now we identify those seed regions where pg,s is less than a pre-specified threshold pf as regions connected to gene g. For all applications presented here pf is arbitrarily set to 0.1. The number of seed regions containing at least one gene exceeding this threshold, nhit , can be approximated under a random model with a Poisson distribution. We assign a greater weight to those cases where there is greater similarity; that is in the cases where pg,s is particularly small: Under a random model, if pg,s 0.2. We restrict keywords to those that appear in >500 documents, contain >3 letters, and have no numbers. For each term, i, we calculate a score which is the difference between averaged term frequencies among candidate genes and all genes: The top twenty highest scoring terms are selected as keywords. Annotation based relatedness We defined a relatedness metric between genes based on similarity in Gene Ontology annotation terms [27]. We downloaded Gene Ontology structure and annotations on December 19, 2006. In addition to human gene GO annotations, we added orthologous gene annotations. Since GO is a hierarchically structured vocabulary, for each gene annotation we also added all of the more general ancestral terms. This resulted in a total of 843,898 annotations for 18,050 genes with 10,803 unique GO terms; this corresponds to a median of 40 terms per gene. We weighted annotations proportionally to the inverse of their frequency, so common annotations received less emphasis. We used a weighting scheme analogous to the one we used for word weighting: where gij represented the weighted code i for gene j, NG is the total number of genes, and gfi (or GO frequency) is the number of genes annotated with the term i. Gene relatedness was the correlation between these weighted annotation vectors. Gene expression based relatedness To calculate gene relatedness based on expression we downloaded the Novartis Gene Expression Atlas [28]. The data set consists of measurements for 33,689 probes across 158 conditions. Probes were averaged into 17,581 gene profiles. Gene relatedness was calculated as the correlation between expression vectors. Lipid and height applications We applied GRAIL to score 19 lipid-associated SNPs and separately to score 42 height-associated SNPs. Specific SNPs are listed in Table S1 and Table S2. We used the SNP sets as both the seed and the query set to look for relatedness between genes across regions. We scored SNPs separately using text, annotation, and expression similarity metrics. We compiled the best candidate genes and scores for the SNP regions. Crohn's disease application Prior to replication, we had access to 74 independent SNP regions that had emerged from a meta-analysis of Crohn's Disease. All 74 SNPs were used as both the query set and as the seed set into GRAIL. We assessed whether those SNPs that replicated had different text-based significance values than those that fail to replicate. To identify additional regions of interest, we identified the next 75 most significant regions in the Crohn's disease meta-analysis – they were used in GRAIL as a query set; for the seed set included all SNPs that did not fail in replication. Schizophrenia application We identified singleton deletions or confirmed de novo deletions reported by one of three groups. We selected those deletions that were in cases only or in controls only, were at least 100 kb large, and included at least one gene. We obtained singleton deletions online published by the International Schizophrenia Consortium (2008) at [8]. We obtained de novo deletions published by Xu et al (2008) from Table 1 [10]. We obtained singleton deletions published in Walsh et al (2008) from Table 2 [9]. We identified a total of 165 case-only deletions and 122 control-only deletions. We applied the GRAIL algorithm separately to case and controls. We speculated that the case deletions might hit genes from a common pathway and GRAIL p-values may therefore be enriched for significant scores. On the other hand, we hypothesized that control deletions might be located effectively at random, and so no particular pathway or common function should necessarily be enriched in this collection. To examine genes for tissue specific expression in the CNS system, we obtained a large publicly available human tissue expression microarray panel (GEO accession: GSE7307) [30]. We analyzed the data using the robust multi-array (RMA) method for background correction, normalization and polishing [55]. We filtered the data excluding probes with either 100% ‘absent’ calls (MAS5.0 algorithm) across tissues, expression values <20 in all samples, or an expression range <100 across all tissues. To represent each gene, we selected the corresponding probe with the greatest intensity across all samples. The data contained expression profiles for 19,088 genes. We included expression profiles from some 96 normal tissues and excluded disease tissues and treated cell lines. We averaged expression values from replicated tissues averaged into a single value. To assess whether genes had differential expression for CNS tissues, we compared the 27 tissue profiles that represented brain or spinal cord to the remaining 69 tissue profiles with a one-tailed Mann-Whitney rank-sum test. Genes obtaining p<0.01 were identified as preferentially expressed. Evaluation against other published methods We compared GRAIL's performance in its ability to prospectively predict Crohn's associations to five other published methods. The selection of these methods, and the evaluation is detailed in Text S1. Software An online version of this method is available (http://www.broad.mit.edu/mpg/grail/). Supporting Information Figure S1 GRAIL p-value scores for random SNPs. We scored 100 random groups of 50 SNPs with GRAIL. The y-axis is the fraction of SNPs in the group with values below the threshold, the x-axis lists the specific threshold. For each threshold, we plot the distribution of the fraction of the 50 SNPs below that threshold as a box plot. The bar is the median - the mean value is explicitly listed below the box-plot. The box at each threshold lists the 25%–75% range. The error-bars line depicts the 1.5 inter-quartile range. The black dots illustrate outliers outside the 1.5 inter-quartile range. (0.39 MB PDF) Click here for additional data file. Figure S2 Sensitivity versus specificity for prioritization algorithms. We used 5 algorithms to score the 74 most promising putative SNP associations from the Crohn's meta-analysis study. We assessed each algorithm's ability to predict those SNP associations that ultimately validated in follow-up genotyping. For each algorithm, we created a received-operator curve (ROC). (0.40 MB PDF) Click here for additional data file. Table S1 19 Lipid regions scored with Text based GRAIL strategy. Here we scored 19 SNPs, associated with lipid metabolism. In the first three columns we list information about the SNP. In the fourth column we list the number of genes in the SNP associated regions. In the fifth column we list the highest scoring gene in the associated region based on GRAIL using a text-based metric. In the sixth column we list the ptext values for the associated regions. We have bolded those candidate genes that are known likely causative gene. The seventh and eight columns list similar results for GRAIL with an GO annotation-based metric. The ninth and tenth columns list similar results for GRAIL with an expression-based metric. (0.15 MB DOC) Click here for additional data file. Table S2 42 Height regions scored with Text based GRAIL strategy. Here we scored 42 SNPs, associated with height. In the first three columns we list information of the SNP. In the fourth column we list the number of genes in the SNP associated regions. In the fifth column we list the highest scoring gene in the associated region for the SNP based on GRAIL using a text-based metric. In the sixth column we list the ptext values for the associated regions. The seventh and eight columns list similar results for GRAIL with an annotation-based metric. The ninth and tenth columns list similar results for GRAIL with an expression-based metric. (0.28 MB DOC) Click here for additional data file. Table S3 Keywords for Lipid and Height SNPs. We identified keywords associated with lipid and height associated SNPs; here we list the top 20. (0.06 MB DOC) Click here for additional data file. Table S4 Crohn's Disease SNPs from a meta-analysis of GWA studies. Here we list GRAIL results and summarize genotyping results for Crohn's disease SNPs. These 74 SNPs emerged from a meta-analysis and as a result of replication genotyping, they were either validated (A), indeterminate (B), or failed (C). For each of the regions we list the SNP ID and the chromosome in the second and third column. In the fourth column we list the final combined association significance score of the SNP to the Crohn's disease. In the fifth, sixth, and seventh columns we list GRAIL results including the number of genes in the region, the best candidate gene, and the text-based significance score for the region. (0.21 MB DOC) Click here for additional data file. Table S5 Algorithms to prioritize candidate genes. Our search of the literature identified nine algorithms that could be used to prioritize genes for replication. Four methods require no user-specified disease information (supervised), and five require some disease information from the user. We list in each row the name of the disease, the website, the necessary genetic data, the functional data used to prioritize genes, the disease-specific information that must be included, and the availability of the method. (0.09 MB DOC) Click here for additional data file. Table S6 Performance measures for prioritization algorithms. We used five algorithms (column 1) to score putatively associated SNPs from the Crohn's meta-analysis. After calculating an ROC curve for each algorithm, we calculated the AUC (column 2). We also calculated a p-value with a one-tailed rank-sum test comparing the median rank of the validated SNPs to the median rank of the failed SNPs (column 2). (0.04 MB DOC) Click here for additional data file. Table S7 Other promising regions in Crohn's Disease GWA meta-analysis. Information about the top six regions identified by GRAIL from the next 75 most significant regions from the Crohn's GWA study. All associations are indeterminate, and association p-values are taken from the GWA meta-analysis - these regions have not yet been replicated. (0.05 MB DOC) Click here for additional data file. Table S8 Rare or de novo schizophrenia control deletions. Here we list all of the deletions that GRAIL identified as most related to other deleted genes (ptext <0.05). For each deletion we list the chromosome, the range of the deletion, the GRAIL p-value for the region, and the best candidate gene in the region identified by GRAIL. Most genomic coordinates are listed in HG17. * HG18 coordinates. (0.06 MB DOC) Click here for additional data file. Text S1 A. Random SNP groups; B. Comparison of GRAIL to other related algorithms. (0.09 MB DOC) Click here for additional data file.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                February 2014
                27 February 2014
                : 10
                : 2
                : e1004123
                Affiliations
                [1 ]Department of Internal Medicine, Erasmus Medical Center Rotterdam, Rotterdam, The Netherlands
                [2 ]Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche, c/o Cittadella Universitaria di Monserrato, Monserrato, Cagliari, Italy
                [3 ]Dipartimento di Scienze Biomediche, Universita di Sassari, Sassari, Italy
                [4 ]Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milan, Italy
                [5 ]Interfaculty Institute for Genetics and Functional Genomics, University Medicine and Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany
                [6 ]Department of Endocrinology and Diabetes, Sir Charles Gairdner Hospital, Nedlands, Western Australia, Australia
                [7 ]Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology and Health Services, University of Washington, Seattle, Washington, United States of America
                [8 ]Institute for Genetic Epidemiology, Helmholtz Zentrum Munich, Munich/Neuherberg, Germany
                [9 ]Department of Endocrinology and Internal Medicine, University Hospital Ghent and Faculty of Medicine, Ghent University, Ghent, Belgium
                [10 ]Internal Medicine, Division of Endocrinology, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands
                [11 ]Department for Health Evidence, Radboud University Medical Centre, Nijmegen, The Netherlands
                [12 ]Institute of Behavioural Sciences, University of Helsinki, Helsinki, Finland
                [13 ]Oxford Centre for Diabetes, Endocrinology and Metabolism and NIHR Oxford Biomedical Research Centre, Oxford, UK Churchill Hospital, Headington, Oxford, United Kingdom
                [14 ]Research Centre for Prevention and Health, Glostrup University Hospital, the Capital Region of Denmark, Glostrup, Denmark
                [15 ]Genetics of Complex Traits, University of Exeter Medical School, University of Exeter, Exeter, United Kingdom
                [16 ]Peninsula NIHR Clinical Research Facility, University of Exeter Medical School, University of Exeter, Exeter, United Kingdom
                [17 ]Institute of Medical Epidemiology, Biostatistics, and Informatics, Martin-Luther-University Halle-Wittenberg, Halle, Germany
                [18 ]Comprehensive Cancer Center, Ohio State University, Columbus, Ohio, United States of America
                [19 ]Department of Epidemiology, Erasmus Medical Center Rotterdam, Rotterdam, The Netherlands
                [20 ]Departments of Medicine, Human Genetics, Epidemiology and Biostatistics, Lady Davis Institute, McGill University, Montreal, Canada
                [21 ]Department of Twin Research and Genetic Epidemiology, King's College London, London, United Kingdom
                [22 ]National Institute for Health and Welfare, Helsinki, Finland
                [23 ]Hospital for Children and Adolescents, Helsinki University Central Hospital and University of Helsinki, Helsinki, Finland
                [24 ]Institute of Laboratory Medicine, Clinical Chemistry and Molecular Diagnostics, University Hospital Leipzig, Leipzig, Germany
                [25 ]Wellcome Trust Sanger Institute, Hixton, United Kingdom
                [26 ]Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany
                [27 ]Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
                [28 ]Helmholtz Zentrum Muenchen, German Research Center for Environmental Health, Institute of Epidemiology II, Neuherberg, Germany
                [29 ]Department of Psychiatry and Psychotherapy, University Medicine Greifswald, HELIOS Hospital Stralsund, Greifswald, Germany
                [30 ]Pathwest Laboratory Medicine WA, Nedlands, Western Australia, Australia
                [31 ]Research Unit of Molecular Epidemiology Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
                [32 ]School of Medicine and Pharmacology, the University of Western Australia, Crawley, Western Australia, Australia
                [33 ]UWA Centre for Medical Research, Western Australian Institute for Medical Research, Perth, Western Australia, Australia
                [34 ]School of Population Health, University of Western Australia, Nedlands, Western Australia, Australia
                [35 ]MRC Lifecourse Epidemiology Unit, Southampton General Hospital, Southampton, United Kingdom
                [36 ]School of Pathology and Laboratory Medicine, University of Western Australia, Crawley, Western Australia, Australia
                [37 ]High Performance Computing and Network, CRS4, Parco Tecnologico della Sardegna, Pula, Italy
                [38 ]Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
                [39 ]Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland
                [40 ]Vaasa Health Care Centre, Diabetes Unit, Vaasa, Finland
                [41 ]Department of Respiratory Medicine, Sir Charles Gairdner Hospital, Nedlands, Western Australia, Australia
                [42 ]Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
                [43 ]Institute of Human Genetics, Helmholtz Zentrum Munich, Munich, Germany
                [44 ]Institute of Human Genetics, Technische Universität München, Munich, Germany
                [45 ]Department of Cardiology and Internal Medicine, University Hospital Ghent and Faculty of Medicine, Ghent University, Ghent, Belgium
                [46 ]Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
                [47 ]Department of Medical Genetics, University of Helsinki and University Central Hospital, Helsinki, Finland
                [48 ]Diagnostica Stago, Doncaster, Victoria, Australia
                [49 ]Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute, Torrance, California, United States of America
                [50 ]Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, California, United States of America
                [51 ]Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
                [52 ]Department of Twin Research and Genetic Epidemiology, King's College London, London, United Kingdom
                [53 ]School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, Univeristy of Birmingham, Edgbaston, Birmingham, United Kingdom
                [54 ]Diabetes, Endocrinology and Vascular Health Centre, Royal Devon and Exeter NHS Foundation Trust, Exeter, United Kingdom
                [55 ]BIOBIX Lab. for Bioinformatics and Computational Genomics, Dept. of Mathematical Modelling, Statistics and Bioinformatics. Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
                [56 ]Faculty of Health Science, University of Copenhagen, Copenhagen, Denmark
                [57 ]Department of General Practice and Primary Health Care, University of Helsinki, Helsinki, Finland
                [58 ]Helsinki University Central Hospital, Unit of General Practice, Helsinki, Finland
                [59 ]Folkhalsan Research Centre, Helsinki, Finland
                [60 ]Vasa Central Hospital, Vasa, Finland
                [61 ]Curtin Health Innovation Research Institute, Curtin University of Technology, Bentley, Western Australia, Australia
                [62 ]Institute of Epidemiology I, Helmholtz Zentrum Munich, Munich, Germany
                [63 ]Group Health Research Institute, Group Health Cooperative, Seattle, Washington, United States of America
                [64 ]Department of Internal Medicine, Diabetes & Endocrinology Unit, San Raffaele Scientific Institute and Vita-Salute San Raffaele University, Milan, Italy
                [65 ]Laboratory of Genetics, National Institute on Aging, Baltimore, Maryland, United States of America
                [66 ]Institute for Maternal and Child Health - IRCCS “Burlo Garofolo”, Trieste, Italy
                [67 ]University of Trieste, Trieste, Italy
                [68 ]Biopharmacy, Department of Pharmaceutical Sciences, University Basel, Basel, Switzerland
                [69 ]Netherlands Consortium for Healthy Aging, Netherlands Genomics Initiative, Leiden, The Netherlands
                [70 ]Department of Internal Medicine, VU Medical Center, Amsterdam, The Netherlands
                [71 ]Division of Endocrinology, Diabetes, and Metabolism, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
                [72 ]Institute of Molecular Genetics-CNR, Pavia, Italy
                Yale School of Medicine, United States of America
                Author notes

                ¶ SS, SN and RPP also contributed equally to this work.

                I have read the journal's policy and have the following conflicts: Dr. Bruce M Psaty reported serving on a DSMB for a clinical trial of a device funded by the manufacturer (Zoll LifeCor) and on the Yale Open Data Access Project funded by Medtronic. All other authors have declared that no competing interests exist.

                Conceived and designed the experiments: MM SJB RAJ RR AA HJG ER JIR HH LC DTi BV TdM TJ JGE BMP AHo DS HW AdlC TMF AL KR LAK AGU JPW KS EWic CMe MdH TJV TDS SGW HV AC DTo SS SN RPP. Performed the experiments: MM EP GP AT LC SJB RAJ RR GLR TSP SHV JL MJS LLNH RMF BMS CG YSA AL TJV SS SN RPP. Analyzed the data: MM EP GP AT SJB RAJ RR GLR TSP SHV JL MJS LLNH RMF SLi BMS DP LC LB CG TC EK BT YET AA MvdB CMa TEG MT NP YSA AdlC RTNM SCLG JMK AL JWAS FR MdH SS RPP. Contributed reagents/materials/analysis tools: MM RR GLR TSP JL MJS LLNH BMS RN MGP CSa UV JBR FCS TIMK WEV ATH JK LC AHa WL GH ML SM NS MC MN CSp AR MH EML ER PJL SLa MV GA EWid AP AD APB DIWP JPB AM TF AJ JH HP EER PF SJF JIR AK DR GLS EB HH JAF BV TdM TJ JGE PCO ARH BMP TI AHo HW AdlC RTNM SCLG HMzS TMF AL FR AGU JPW CMe TJV TDS SGW HV AC DTo RPP. Wrote the paper: MM AT LC TJV SGW AC SS SN RPP.

                Article
                PGENETICS-D-13-02310
                10.1371/journal.pgen.1004123
                3937134
                24586183
                f988a05d-6130-4d61-9754-78b1c9030fe8
                Copyright @ 2014

                This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

                History
                : 22 August 2013
                : 3 December 2013
                Page count
                Pages: 13
                Funding
                The Asklepios Study was supported by a Fonds voor Wetenschappelijk Onderzoek–Vlaanderen FWO research grant G.0427.03 and G.0838.10N (Asklepios Study). The 1994-5 Busselton Health Survey was funded by Healthway, Western Australia. The Busselton Health Studies are supported by the National Health and Medical Research Council of Australia and the Great Wine Estates Auctions. The CHS research reported in this article was supported by NHLBI contracts HHSN268201200036C, N01HC85239, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086; and NHLBI grants HL080295, HL087652, HL105756 with additional contribution from the National Institute of Neurological Disorders and Stroke (NINDS). Additional support was provided through AG023629 from the National Institute on Aging (NIA). DNA handling and genotyping at Cedars-Sinai Medical Center was supported in part by the National Center for Research Resources, grant UL1RR033176, and is now at the National Center for Advancing Translational Sciences, CTSI grant UL1TR000124; in addition to the National Institute of Diabetes and Digestive and Kidney Disease grant DK063491 to the Southern California Diabetes Endocrinology Research Center. Additional funding was provided by the Cedars-Sinai Board of Governors' Chair in Medical Genetics (JIR). The CARLA Study was founded by a grant from the Deutsche Forschungsgemeinschaft as part of the Collaborative Research Center 598 “Heart failure in the elderly - cellular mechanisms and therapy” at the Medical Faculty of the Martin-Luther-University Halle-Wittenberg, by a grant of the Wilhelm-Roux Programme of the Martin-Luther-University Halle-Wittenberg; by the Ministry of Education and Cultural Affairs of Saxony-Anhalt, and by the Federal Employment Office. The Exeter Family Study of Childhood Health (EFSOCH) was supported by South West NHS Research and Development, Exeter NHS Research and Development, the Darlington Trust, and the Peninsula NIHR Clinical Research Facility at the University of Exeter. Genotyping of EFSOCH DNA samples was supported by the Endocrine Research Fund. ATH and BMS are employed as core members of the Peninsula NIHR Clinical Research Facility. RMF is funded by a Sir Henry Wellcome Postdoctoral Fellowship (Wellcome Trust grant: 085541/Z/08/Z). The Health2006 Study is funded by grants from The Velux Foundation; The Danish Medical Research Council, Danish Agency for Science, Technology and Innovation; The Aase and Ejner Danielsens Foundation; ALK-Abelló A/S (Hørsholm, Denmark), Timber Merchant Vilhelm Bangs Foundation, MEKOS Laboratories (Denmark), the Health Insurance Foundation, and Research Centre for Prevention and Health, the Capital Region of Denmark. Helsinki Birth Cohort Study has been supported by grants from the Academy of Finland, the Finnish Diabetes Research Society, Finnish Society for Cardiovascular Research, Folkhälsan Research Foundation, Novo Nordisk Foundation, Finska Läkaresällskapet, Signe and Ane Gyllenberg Foundation, University of Helsinki, European Science Foundation (EUROSTRESS), Ministry of Education, Ahokas Foundation, Emil Aaltonen Foundation, Juho Vainio Foundation, and Wellcome Trust (grant number WT089062). This work was supported by KORA, which is a research platform initiated and financed by the Helmholtz Center Munich, German Research Center for Environmental Health, by the German Federal Ministry of Education and Research and by the State of Bavaria. The work of KORA is supported by the German Federal Ministry of Education and Research (BMBF) in the context of the German National Genome Research Network (NGFN-2 and NGFN-plus). The present research was supported within the Munich Center of Health Sciences (MC Health) as part of LMUinnovativ. Thyroid examinations in KORA-F4 were supported by Sanofi-Aventis in the framework of the Papillon Initiative. Collection and genotyping of the NBS samples was funded in part by the European Commission (POLYGENE: LSHC-CT-2005-018827) and a research investment grant of the Radboud University Nijmegen Medical Centre. This work was sponsored by the National Computing Facilities Foundation (NCF) for the use of supercomputer facilities, with financial support from the NWO. The Thyroid Cancer Program (P.I. Matthew Ringel) at the Ohio State University is supported by grants P30 CA16058 and P01 CA124570 from the National Cancer Institute, USA. The generation and management of GWAS genotype data for the Rotterdam Study is supported by the Netherlands Organisation of Scientific Research NWO Investments (no. 175.010.2005.011, 911-03-012). This study is funded by the Research Institute for Diseases in the Elderly (014-93-015; RIDE2), the Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) project no. 050-060-810. The Rotterdam Study is funded by Erasmus Medical Center and Erasmus University, Rotterdam, Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. The SardiNIA study is supported by the Intramural Research Program of the National Institute on Aging (NIA), National Institutes of Health (NIH). The SardiNIA (“Progenia”) team was supported by Contract NO1-AG-1–2109 from the NIA; the efforts of SS were supported in part by contract 263-MA-410953 from the NIA to the University of Michigan and by research grant HG002651. SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the Federal Ministry of Education and Research (grants no. 01ZZ9603, 01ZZ0103, and 01ZZ0403), the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania, and the network ‘Greifswald Approach to Individualized Medicine (GANI_MED)’ funded by the Federal Ministry of Education and Research (grant 03IS2061A). Genome-wide data have been supported by the Federal Ministry of Education and Research (grant no. 03ZIK012) and a joint grant from Siemens Healthcare, Erlangen, Germany and the Federal State of Mecklenburg-West Pomerania. Data analyses were further supported by the German Research Foundation (DFG Vo 955/10-2; SPP 1629: THYROID TRANS ACT WA 1328/5-1) and the Federal Ministry of Nutrition, Agriculture and Consumer's Safety (BMELV 07 HS 003). SHIP-Trend is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the Federal Ministry of Education and Research (grants no. 01ZZ9603, 01ZZ0103, and 01ZZ0403), the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg - West Pomerania. Thyroid-related examinations have been funded by the Federal Ministry of Nutrition, Agriculture and Consumer's Safety (BMELV 07 HS 003) and the German Research Foundation (DFG Vo 955/10-1; SPP 1629: THYROID TRANS ACT WA 1328/5-1). Genome-wide data have been supported by the Federal Ministry of Education and Research (grant no. 03ZIK012) and a joint grant from Siemens Healthcare, Erlangen, Germany and the Federal State of Mecklenburg, West Pomerania. Whole-body MR imaging was supported by a joint grant from Siemens Healthcare, Erlangen, Germany and the Federal State of Mecklenburg West Pomerania. TwinsUK received funding from the Wellcome Trust; the Chronic Disease Research Foundation; the European Community's Seventh Framework Program grant agreement (FP7/2007-2013); ENGAGE project grant agreement (HEALTH-F4-2007-201413); the Department of Health via the National Institute for Health Research (NIHR) Comprehensive Biomedical Research Centre award to Guy's & St Thomas' NHS Foundation Trust in partnership with King's College London; the Canadian Institutes of Health Research, Canadian Foundation for Innovation, Fonds de la Recherche en Santé Québec, Ministère du Développement Économique, de l′Innovation et de l′Exportation Québec and the Lady Davis Institute of the Jewish General Hospital; the Australian National Health and Medical Research Council (Project Grants 1010494, 1031422) and the Sir Charles Gairdner Hospital Research Fund. Val Borbera was supported by funds from Compagnia di San Paolo, Torino, Italy; Fondazione Cariplo, Italy and Ministry of Health, Ricerca Finalizzata 2008. The UK Graves' disease cohort was funded by the Wellcome Trust grant 068181. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Medicine
                Endocrinology
                Thyroid
                Graves' disease
                Hashimoto disease
                Hypothyroidism

                Genetics
                Genetics

                Comments

                Comment on this article