24
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Convergent downstream candidate mechanisms of independent intergenic polymorphisms between co-classified diseases implicate epistasis among noncoding elements §

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Eighty percent of DNA outside protein coding regions was shown biochemically functional by the ENCODE project, enabling studies of their interactions. Studies have since explored how convergent downstream mechanisms arise from independent genetic risks of one complex disease. However, the cross-talk and epistasis between intergenic risks associated with distinct complex diseases have not been comprehensively characterized. Our recent integrative genomic analysis unveiled downstream biological effectors of disease-specific polymorphisms buried in intergenic regions, and we then validated their genetic synergy and antagonism in distinct GWAS. We extend this approach to characterize convergent downstream candidate mechanisms of distinct intergenic SNPs across distinct diseases within the same clinical classification. We construct a multipartite network consisting of 467 diseases organized in 15 classes, 2,358 disease-associated SNPs, 6,301 SNP-associated mRNAs by eQTL, and mRNA annotations to 4,538 Gene Ontology mechanisms. Functional similarity between two SNPs (similar SNP pairs) is imputed using a nested information theoretic distance model for which p-values are assigned by conservative scale-free permutation of network edges without replacement (node degrees constant). At FDR≤5%, we prioritized 3,870 intergenic SNP pairs associated, among which 755 are associated with distinct diseases sharing the same disease class, implicating 167 intergenic SNPs, 14 classes, 230 mRNAs, and 134 GO terms. Co-classified SNP pairs were more likely to be prioritized as compared to those of distinct classes confirming a noncoding genetic underpinning to clinical classification (odds ratio ~3.8; p≤10 −25). The prioritized pairs were also enriched in regions bound to the same/interacting transcription factors and/or interacting in long-range chromatin interactions suggestive of epistasis (odds ratio ~ 2,500; p≤10 −25). This prioritized network implicates complex epistasis between intergenic polymorphisms of co-classified diseases and offers a roadmap for a novel therapeutic paradigm: repositioning medications that target proteins within downstream mechanisms of intergenic disease-associated SNPs. Supplementary information and software: http://lussiergroup.org/publications/disease_class

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            An Integrated Encyclopedia of DNA Elements in the Human Genome

            Summary The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Trans-eQTLs Reveal That Independent Genetic Variants Associated with a Complex Phenotype Converge on Intermediate Genes, with a Major Role for the HLA

              Introduction For many complex traits and diseases, numerous associated single nucleotide polymorphisms (SNPs) have been identified through genome-wide association studies (GWAS)through genome-wide association studies (GWAS) [1]. For many of these identified variants it is still unclear through which mechanism the association between the SNP and the trait or disease phenotype is mediated. A complicating factor is that disease-associated variants might not be the real causal variants, but are in linkage disequilibrium (LD) with the true disease-causing variant, making it difficult to accurately implicate the correct gene for a locus in disease pathogenesis. Within the major histocompatibility locus (MHC) on 6p, many SNPs have been found to be associated with complex diseases such as celiac disease, inflammatory bowel disease, psoriasis, rheumatoid arthritis, diabetes mellitus, schizophrenia, lung cancer and follicular lymphoma [2]–[10]. An analysis of the Catalog of Published Genome-Wide Association Studies [1] revealed that out of 1,167 unique SNP associations with a reported p 0.19, p ]( [ ^_]*)_). For each probe it was determined whether it was mapping uniquely to one particular genomic locus, or, if multiple hits were present whether all these mappings resided in each other vicinity ( 5%). This was different for PCAs 26–50 in the HT12 data: 11 PCs were under substantial genetic control (Figure S9a). We therefore assumed that most trans-eQTLs could be detected when removing approximately 25 PCs. We quantified this systematically, by removing increasing amounts of PCs from the expression data and conducting a full genome-wide trans-eQTL mapping. Indeed, in these analyses at most 244 significant trans-eQTLs could be detected (at FDR 0.05, with potential false-positives due to cross-hybridizations removed), when removing 25 PCs (Figure S9b). The overlap with the expression with no PCs removed was substantial: 62 of the 82 trans-eQTLs (77%), detected in the original analysis were detected as well in the analysis with 25 PCs removed (Figure S9c), all with identical allelic directions (Figure S9d). Identification of false eQTLs due to primer polymorphisms and cross-hybridization One should be aware that sequence polymorphisms can cause many false cis-eQTLs [53]. Such false cis-eQTLs do not reflect actual expression differences caused by sequence polymorphisms in cis-acting factors that affect mRNA levels. Instead they indicate hybridization differences caused by sequence polymorphisms in the mRNA region that is targeted by the microarray expression probes. Therefore, SNP-probe combinations were excluded from the cis-eQTL analysis when the 50 bp long expression probe mapped to a genomic location that contained a known SNP that was showing at least some LD (r2>0.1) with the cis-SNP. We used SNP data from the 1000 Genomes Projects, as it contains LD information for 9,633,115 SNPs (April 2009 release, based on 57 CEU samples of European descent). Detected trans-eQTLs might also reflect false-positives, although we initially had attempted to map the expression probes as accurately as possible, by using the aforementioned three different mapping strategies: it is still well possible that some of the identified, putative trans-eQTLs in fact reflect very subtle cross-hybridization (e.g. pertaining to only a small subsequence of the probe). We therefore tried to falsify each of the putative trans-eQTLs by attempting to map each trans-probe into the vicinity of the SNP probe location, by using a highly relaxed mapping approach. All putative Illumina trans-expression probes were mapped using SHRiMP [54], which uses a global alignment approach, to the human reference genome (NCBI 36.3 build). The mapping settings were chosen very loosely to permit the identification of nearly all potential hybridization locations: match score was 10, the mismatch score was 0, the gap open penalty was −250, the gap extension penalty was −100, Smith and Waterman minimum identical alignment threshold was 30.0%, while other SHRiMP parameters were left at default. Using these settings all mappings with a minimum overlap of 15 bases, or with 20 matches with one mismatch, or 30 matches with 2 mismatches, or full-length (50 bp) probe hybridizations with no more than 15 mismatches were accepted. Any trans-eQTL was discarded, if the expression probe had a mapping that was within 2 Mb of the SNP that showed the trans-eQTL effect. Once these potential false-positive trans-eQTLs had been removed from the real, non-permuted data, we repeated the multiple testing correction (again controlling the FDR at 0.05). Using this strategy we observed several instances where only 20 out the 50 bases of a probe sequence mapped in the vicinity of the trans-SNP (data not shown). For these trans-eQTLs the Spearman's rank correlation p was often lower than 10−100, which would imply these SNPs explain over 25% of the total expression variation of the corresponding trans-genes. Given the small amount of trans-eQTLs we detected in total, such effect sizes are quite unlikely and therefore provide circumstantial evidence these indeed reflect cross-hybridization artifacts. We also assessed whether any of the Illumina SNPs that constitute trans-eQTLs might map to a different position than what is reported in dbSNP. As such we mapped the 50 bp Illumina SNP probe sequences to the genome assembly, permitting up to four mismatches per 50 bp SNP probe sequence. We did not observe any SNP that could map (with some mismatches) to the same chromosome of the trans-probe. It is still possible that some of the trans-eQTLs for which we did not find any evidence of cross-hybridization, still are false positives, e.g. by missing some cross-hybridizations due to imperfections in the NCBI v36 assembly we used. Although we have identified numerous occasions where a SNP affects two different probes within the same gene in trans, substantiating the likelihood these trans-eQTLs are real, providing unequivocal evidence that all our reported trans-eQTLs are real is not straightforward. Enrichment analysis of trait-associated SNPs and SNPs located within the HLA region To assess enrichment of trait-associated SNPs, we used a collection of 1,262 unique SNPs from 'A Catalog of Published Genome-Wide Association Studies' (accessed 09 February 2010, and each having at least one reported association p-value 0.05, an HWE exact p-value >0.0001 and call-rate >95%. To ascertain whether these SNPs are more often constituting an eQTL than expected, we used a methodology that is not affected by the following potential confounders: non-even distribution of SNP markers and expression probe markers across the genome, differences in MAF between SNPs and LD structure within the genotype date and correlation between probes in the expression data. Additionally, this methodology is also not confounded by the fact that for certain traits different SNPs in strong LD can have been reported, due to differences in the platforms that were used to identify these loci. We first determined how many unique eQTL SNPs had been identified in the original eQTL mapping (with an FDR<0.05) and how many of these are trait-associated. Subsequently we permuted the expression phenotypes relative to the genotypes (thus keeping the correlation structure within the genotype data and the correlation structure within the expression data intact, yet assigning the genotypes of a sample to the expression data of a randomly chosen sample) and reran the eQTL mapping, sorting all tested eQTLs on highest significance. We then took an equal number of top associated, but permuted, eQTL SNPs and determined how many of these permuted eQTL SNPs are trait-associated. By performing 100 permutations we obtained an empiric distribution of the number of trait-associated SNPs expected by chance. We subsequently fitted a generalized extreme value distribution (EVD, using the EVD add-on package for R), permitting us to estimate realistic enrichment significance estimates (called EVD p throughout the manuscript). For the MHC enrichment analysis the followed procedure was identical, with the difference that we looked for enrichment for SNPs within the MHC, defined as SNPs physically mapping between 20 Mb and 40 Mb on chromosome 6 (NCBI 36 assembly). Trans-eQTL replication datasets Replication of the detected eQTLs was performed in monocytes from 1,490 different samples [45] and in an independent population of 86 morbidly obese individuals that underwent elective bariatric surgery (Department of general surgery, Maastricht University Medical Centre, the Netherlands). Both these datasets also used the same Illumina HumanHT-12 expression platform. For the 1,490 monocyte samples eQTL P-Values summary statistics were available for all monocyte trans-eQTLs with a nominal p<1.0×10−5. We ascertained how many of the trans-eQTLs we had found in our peripheral blood data had a nominal eQTL p<1.0×10−5 in this monocyte dataset. We also assessed trans-eQTLs in four different tissues from the 86 morbidly obese individuals that underwent bariatric surgery. DNA was extracted from blood samples using the Chemagic Magnetic Separation Module 1 (Chemagen) integrated with a Multiprobe II Pipeting robot (PerkinElmer). All samples were genotyped using both Illumina HumanCytoSNP-12 BeadChips and Illumina HumanOmni1-Quad BeadChips (QC was identical as was applied to the peripheral blood samples). We imputed HapMap 2 genotypes using Impute version 2.0. In addition expression profiling was performed for four different tissues for each of these individuals using the Illumina HumanHT-12 arrays. Wedge biopsies of liver, visceral adipose tissue (VAT, omentum majus), subcutaneous adipose tissue (SAT, abdominal), and muscle (musculus rectus abdominis) were taken during surgery. RNA was isolated using the Qiagen Lipid Tissue Mini Kit (Qiagen, UK, 74804). Assessment of RNA quality and concentration was done with an Agilent Bioanalyzer (Agilent Technologies USA). Starting with 200 ng of RNA, the Ambion Illumina TotalPrep Amplification Kit was used for anti-sense RNA synthesis, amplification, and purification according to the protocol provided by the manufacturer (Ambion, USA). 750 ng of complementary RNA was hybridized to Illumina HumanHT12 BeadChips and scanned on the Illumina BeadArray Reader. Expression data preprocessing was as mentioned before. We first attempted to replicate the trait-associated trans-eQTLs per tissue, using an FDR of 0.05 and 100 permutations. Subsequently we conducted a meta-analysis, combining the four tissues. Per trans-eQTL we used a weighted Z-method to combine the four individual p-values. However, these four datasets are not independent, as they reflect the same individuals. We resolved this by conducting the permutations in such a way that in every permutation round the samples were permuted in exactly the same way for each of the four tissues. By doing this we retained the correlations that exist between the different tissues per sample, and were able to get a realistic empiric (null-)distribution of expected test-statistics. Convergence analysis Per trait we assessed all the SNPs that have been reported to be associated with that particular trait. We analyzed per trait all possible SNP-pairs. If a pair of SNPs was not in LD (r2<0.001) we assessed whether they affected the same gene in cis or trans. When using the trait-associated cis- and trans-eQTLs that had been identified when controlling the FDR at 0.05, we identified 7 unique pairs of SNPs that caused both the same phenotype and also affected the same gene(s). When using a somewhat more relaxed set of trans-eQTLs, identified when controlling the FDR at 0.5, we identified 18 unique pairs of SNPs that affect the same downstream gene. We assessed whether these numbers were significantly higher than expected, by using the same strategy that we had used to assess the enrichment of trait-associated SNPs and the HLA; we ran 100 permutations. We kept per permutation the cis-eQTL list as it was, but generated a permuted set of trans-eQTLs, equal in size to the original set of non-permuted trans-eQTLs. This enabled us to determine per permutation round how many unique pairs of SNPs converge on the same gene(s). We subsequently fitted a generalized extreme value distribution, permitting us to estimate realistic enrichment significance estimates. Co-expression between genes, based on HT12 peripheral blood co-expression If a particular SNP is cis- or trans-acting on multiple genes, it is plausible that those genes are biologically related. Co-expression between these genes provides circumstantial evidence this is the case, strengthening the likelihood such cis- and trans-eQTLs are real. We assessed this in the peripheral blood data, by using the expression data of the 1,240 samples, run on the comprehensive HT12 expression platform. As we had removed 25 PCs (to remove physiological, environmental variation, and systematic experimental variation) for the trans-eQTL analyses, we decided to confine co-expression analyses to this expression dataset. As there are 43,202 HT12 probes that we mapped to a known genomic location, 43,202×43,201/2 = 933,184,801 probe-pairs exist. Given 1,240 samples, a Pearson correlation coefficient r≥0.19 corresponds to a p<0.05 when applying stringent Bonferroni correction for these number of probe-pairs. Accession numbers Expression data for both the peripheral blood and the four non-blood datasets have been deposited in GEO with accession numbers GSE20142 (1,240 peripheral blood samples, hybridized to HT12 arrays), GSE20332 (229 peripheral blood samples, hybridized to H8v2 arrays) and GSE22070 (subcutaneous adipose, visceral adipose, muscle and liver samples). Supporting Information Figure S1 Detected cis- and trans-eQTLs in genome-wide analysis. (TIF) Click here for additional data file. Figure S2 Detected cis- and trans-eQTLs for 1,167 trait-associated SNPs. (TIF) Click here for additional data file. Figure S3 Detected cis- and trans-eQTLs per complex trait. Immune-related and hematological associated SNPs often affect gene expression in cis or trans. (TIF) Click here for additional data file. Figure S4 Co-expression distribution between eQTL genes for mean platelet volume and mean corpuscular volume. (PDF) Click here for additional data file. Figure S5 Replication of trans-eQTLs in four non-blood tissues. (TIF) Click here for additional data file. Figure S6 Principal components used as covariates in analyses. (PDF) Click here for additional data file. Figure S7 Effect of removing principal components from expression data on detect ability of cis-eQTLs. (TIF) Click here for additional data file. Figure S8 Significance of detected cis-eQTLs before and after removal of principal components from expression data. (TIF) Click here for additional data file. Figure S9 Effect of removing principal components from expression data on detect ability of trans-eQTLs. (TIF) Click here for additional data file. Table S1 Detected cis-eQTLs (FDR 0.05) for all common SNPs. (XLSX) Click here for additional data file. Table S2 Detected trans-eQTLs (FDR 0.05) for all common SNPs. (XLS) Click here for additional data file. Table S3 Detected cis-eQTLs (FDR 0.05) for 1,167 trait-associated SNPs. (XLS) Click here for additional data file. Table S4 Detected trans-eQTLs (FDR 0.05) for 1,167 trait-associated SNPs. (XLS) Click here for additional data file. Table S5 Detected cis- and trans-eQTLs (FDR 0.05) per complex trait. (XLS) Click here for additional data file. Table S6 Plots of detected trans-eQTLs for 1,167 trait-associated SNPs for each of the seven individual cohorts of samples that make up the total of 1,469 peripheral blood samples. (PDF) Click here for additional data file. Table S7 Detected trans-eQTLs (FDR 0.50) for 1,167 trait-associated SNPs. (XLS) Click here for additional data file. Table S8 Replicated trans-eQTLs in monocyte eQTL dataset. (XLS) Click here for additional data file. Table S9 Characteristics of subcutaneous adipose, visceral adipose, muscle and liver datasets. (XLS) Click here for additional data file. Table S10 Replicated trans-eQTLs in subcutaneous adipose, visceral adipose, muscle and liver datasets. (XLS) Click here for additional data file. Table S11 Characteristics of peripheral blood expression data. (XLS) Click here for additional data file.
                Bookmark

                Author and article information

                Contributors
                Journal
                9711271
                20660
                Pac Symp Biocomput
                Pac Symp Biocomput
                Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
                2335-6936
                5 December 2017
                2018
                01 January 2018
                : 23
                : 524-535
                Affiliations
                Center for Biomedical Informatics and Biostatistics (CB2) and Departments of Medicine and of Systems and Industrial Engineering, The University of Arizona, Tucson, AZ 85721, USA
                Center for Biomedical Informatics and Biostatistics (CB2) and Departments of Medicine and of Systems and Industrial Engineering, The University of Arizona, Tucson, AZ 85721, USA
                Center for Biomedical Informatics and Biostatistics (CB2) and Departments of Medicine and of Systems and Industrial Engineering, The University of Arizona, Tucson, AZ 85721, USA
                Computation Institute, Argonne National Laboratory and University of Chicago, Chicago, IL 60637, USA
                Computation Institute, Argonne National Laboratory and University of Chicago, Chicago, IL 60637, USA
                CB2, BIO5 Institute, UACC, and Dept of Medicine, The University of Arizona, Tucson, AZ 85721, USA
                CB2, BIO5 Institute, UACC, and Dept of Medicine, The University of Arizona, Tucson, AZ 85721, USA
                Author notes

                Jiali Han, Jianrong Li and Ikbel Achour are joint-first-authors.

                Article
                NIHMS921876
                5730078
                29218911
                8652c2b1-424e-4465-82d7-c8c482666218

                Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.

                History
                Categories
                Article

                snp,intergenic,noncoding,disease class,biological similarity,enrichment

                Comments

                Comment on this article