4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Genome-wide association studies and gene expression profiles of rheumatoid arthritis : An analysis

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Objectives

          The molecular mechanism of rheumatoid arthritis (RA) remains elusive. We conducted a protein-protein interaction network-based integrative analysis of genome-wide association studies (GWAS) and gene expression profiles of RA.

          Methods

          We first performed a dense search of RA-associated gene modules by integrating a large GWAS meta-analysis dataset (containing 5539 RA patients and 20 169 healthy controls), protein interaction network and gene expression profiles of RA synovium and peripheral blood mononuclear cells (PBMCs). Gene ontology (GO) enrichment analysis was conducted by DAVID. The protein association networks of gene modules were generated by STRING.

          Results

          For RA synovium, the top-ranked gene module is HLA-A, containing TAP2, HLA-A, HLA-C, TAPBP and LILRB1 genes. For RA PBMCs, the top-ranked gene module is GRB7, consisting of HLA-DRB5, HLA-DRA, GRB7, CD63 and KIT genes. Functional enrichment analysis identified three significant GO terms for RA synovium, including antigen processing and presentation of peptide antigen via major histocompatibility complex class I (false discovery rate (FDR) = 4.86 × 10 – 4), antigen processing and presentation of peptide antigen (FDR = 2.33 × 10 – 3) and eukaryotic translation initiation factor 4F complex (FDR = 2.52 × 10 – 2).

          Conclusion

          This study reported several RA-associated gene modules and their functional association networks.

          Cite this article: X. Xiao, J. Hao, Y. Wen, W. Wang, X. Guo, F. Zhang. Genome-wide association studies and gene expression profiles of rheumatoid arthritis: an analysis. Bone Joint Res 2016;5:314–319. DOI: 10.1302/2046-3758.57.2000502.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: not found

          Molecular signatures and new candidates to target the pathogenesis of rheumatoid arthritis.

          Rheumatoid arthritis (RA) is a chronic, inflammatory joint disease of unknown etiology and pronounced interpatient heterogeneity. To characterize RA at the molecular level and to uncover pathomechanisms, we performed genome-wide gene expression analysis. We identified a set of 1,054 genes significantly deregulated in pair-wise comparisons between RA and osteoarthritis (OA) patients, RA and normal donors (ND), or OA and ND. Correlation analysis revealed gene sets regulated identically in all three groups. As a prominent example secreted phosphoprotein 1 (SPP1) was identified to be significantly upregulated in RA compared with both OA and ND. SPP1 expression was found to correlate with genes expressed during an inflammatory response, T-cell activation and apoptosis, suggesting common underlying regulatory networks. A subclassification of RA patients was achieved on the basis of proteoglycan 4 (PRG4) expression, distinguishing PRG4 high and low expressors and reflecting the heterogeneity of the disease. In addition, we found that low PRG4 expression was associated with a more aggressive disease stage, which is in accordance with PRG4 loss-of-function mutations causing camptodactyly-arthropathy-coxa vara-pericarditis syndrome. Altogether we provide evidence for molecular signatures of RA and RA subclasses, sets of new candidate genes as well as for candidate gene networks, which extend our understanding of disease mechanisms and may lead to an improved diagnosis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Network-Assisted Investigation of Combined Causal Signals from Genome-Wide Association Studies in Schizophrenia

            Introduction Genome-wide association (GWA) studies have, during the past decade, become a powerful tool to study the genetic components of complex diseases [1]. Although an increasing number of genes/markers have been uncovered in GWA studies, which have provided us important insights into the underlying genetic basis of complex diseases such as schizophrenia [2], [3], [4], it has also become evident that many genes are weakly or moderately associated with the diseases. Most of these variants have been missed in single marker analysis, as investigators typically employ a genome-wide significance cutoff P-value of 5×10−8. Alternatively, the gene set analysis (GSA) of GWAS datasets provides ways to simultaneously examine groups of functionally related genes for their combined effects and thus have improved power and interpretability [5]. Many GSA methods have been reported to date, such as the gene set enrichment analysis [6], the adaptive rank-truncated product [7], the gene set ridge regression in association studies (GRASS) [8], etc. Most of these methods were designed to use pre-defined gene sets such as the KEGG database [9] or the Gene Ontology (GO) annotations [10]. Alternatively, studies are emerging by incorporating protein-protein interaction (PPI) networks into GWAS analysis. Baranzini et al. [11] first adopted a network-based method that was initially designed for gene expression data [12] to analyze the GWAS data for multiple sclerosis. Recently, Rossin et al. [13] developed the Disease Association Protein-Protein Link Evaluator (DAPPLE); it tests whether genes that are located at association loci in a GWAS dataset are significantly connected via PPIs. We have also developed the dense module search (DMS) method [14], which overlays the gene-wise P values from GWAS onto the PPI network and dynamically searches for subnetworks that are significantly enriched with the association signals. The advantages of network-based analysis of GWAS data in comparison with the standard GSA methods lie in many aspects. First, most GSA methods test on pre-defined gene sets, which heavily rely on a priori knowledge and are incomplete. For example, the popular KEGG database has pathway annotations covering only ∼5,000–5,500 genes [15], accounting for less than 30% of the genes in GWAS datasets. In contrast, the annotations of PPI data cover a much larger proportion of human proteins. For example, a recent integrative analysis of PPI data from multiple sources has reconstructed the human PPI network by recruiting ∼12,000 proteins and ∼60,000 protein interaction pairs with experimental evidence [16]. There are other assembled PPI datasets that include both experimentally supported and computationally predicted interactions; thus, they could annotate even more proteins and interactions [17], [18]. Second, the standard GSA methods are typically based on canonical definitions of pathways or functional categories, but the association signals from GWAS might converge on only part of the pathway [19]. In such cases, analysis of the whole pathway as a unit would reduce the power. On the other hand, network-assisted methods allow for the definition of de novo gene sets by dynamically searching for interconnected subnetworks in the whole interactome and, thus, can effectively alleviate the limitation of the fixed size in pathway analysis. Despite these advantages, there are challenges in the application of network-based approaches to GWAS data. For example, the methods for defining or searching subnetworks vary greatly. While it is impractical to examine all possible subnetworks due to the intensive computing burden, different methods or algorithms may identify substantially different subnetworks [20], making it difficult to decide in real application. Additionally, network-based analysis could be confounded by nodes with high degrees (i.e., the number of interactors of each node in the network), although these nodes constitute only a small proportion according to the framework of power-law distribution [21]. One example is TP53, which interacts with several hundreds of other proteins in the whole PPI network. The existence of such hub nodes with strong interaction in the network may help them more likely to be selected in searching subnetworks and, thus, overwhelm the resultant subnetworks. Appropriate adjustments are needed. In this study, we aim to search for modules that are significantly enriched with association signals in human PPI network weighted by GWAS signals. We take advantage of our recently developed dense module search (DMS) algorithm [14] to conduct module searching and construction. Based on this, we introduced statistical evaluations of the modules identified by DMS, including a significance test based on module scores, a weighted resampling method to adjust potential biased in GWAS data (e.g., caused by gene length or SNP density), a topologically matched randomization process to adjust the bias in network (e.g., the high degree nodes), and a permutation test to determine the disease association of the modules. In addition, we propose a bi-directional framework to search for consistent association signals from multiple GWAS datasets available for one specific disease or trait. Specifically, two GWAS datasets were analyzed in parallel: one is assigned as a discovery dataset and another as an evaluation dataset, and vice versa. This strategy provides robust results with partial validation — only the modules that were consistently highly scored would be selected for further validation and functional assessment. We demonstrated it in schizophrenia using two major GWAS datasets for module identification, and incorporated a third dataset to independently replicate the results. Finally, we performed a meta-analysis of the markers that were mapped in the module genes. We identified 18 SNPs in 9 module genes that are of particular interests (P meta 5%), extreme heterozygosity rate (±3 s.d. from the mean value of the distribution), or problematic gender assignment. We used PLINK [28] to compute the identify-by-state (IBS) matrix to pinpoint duplicate or cryptic relationships between individuals, and we retained the sample with the highest call rate for each pair of samples with an identity-by-descent (IBD)>0.185. Principle component analysis (PCA) was performed using the smartpca program in EIGENSTRAT [29] to detect population structure and to allow removal of outlier individuals. Eight significant PCs with the Tracy Widom test P value 5%, minor allele frequency (MAF) Zm ×(1+r), where Zm+1 is the new module score after adding the node, Zm is the original module score and r is a pre-defined rate. We set r to be 0.1 in this study. This module expansion process iterates until none of the neighborhood nodes can satisfy the function Zm+1 >Zm ×(1+r). Because this module construction process was conducted taking each node in the network as the seed gene, several thousands of modules are expected corresponding to the thousands of nodes. Module assessment We provided three procedures to assess the significance of the identified modules, each of which aims to build null distributions for different hypotheses. First, to perform significance test of the identified modules, we calculated P values based on module scores (Zm ) for each module by empirically estimating the null distribution [26]. According to Efron et al. (2010), the null distribution is a normal distribution with mean δ and standard deviation σ, both of which can be empirically estimated using the R package locfdr. Specifically, module scores were first median-centered by subtracting the median value of Zm from each of them, followed by estimation of the parameters of δ and σ for the empirical null distribution using locfdr. The standardized module scores (ZS ) were then calculated and converted to P values, P(Zm ) = 1-Φ(ZS ), where Φ is the normal cumulative density function. Second, to determine whether the module score is higher than expected by chance, a standard way is to randomly select the same number of genes in a module, i.e., resample genes in the network regardless of the interactions, and compare the module score in the random gene set with the score in the real case. Specifically to alleviate the biases in GWAS data (e.g., gene length or SNP density) or the network data (e.g., high-degree nodes), we incorporated weighted resampling which intentionally matches the pattern of biases in each resample to resemble the real case. The gene length bias and the SNP density bias are commonly noticed in GWAS datasets, especially when using the most significant SNP to represent genes [30]. This is because when mapping SNPs to genes, longer genes tend to have more SNPs and in turn have higher chance to be significant. These two types of biases are closely correlated but differ in cases due to different genotyping platforms. For both biases, we first estimate a weight for each gene based on the specific character to be adjusted, and then performed weighted resampling to ensure each of the resample has the similar pattern in term of the adjusted character. This weighted resampling procedure ensures that genes could be selected in a similar pattern of gene length or SNP density as in the real GWAS data. Therefore, the empirical P values for each module built on the bias-matched permutation data could be adjusted by gene length (P GL) or the number of SNPs per gene (P nSNPs). A detailed description of this function can be found in previous work [23]. Another type of bias was that, in the PPI network, nodes with many interactors (high degree) are more likely to be recruited in module expansion steps. We thus categorized all the nodes in the working PPI network into four categories by their degree values (degree range 0–22, 22–24, 24–26, and >26) (Figure S1). For each module, a topologically matched random module was generated by randomly sampling the same number of nodes in each of the four node bins. An empirical P value is computed by , where is the score of the random module for the π th resample, and is the observed module score. Third, to assess the disease association of the modules, we performed permutation test by shuffling case/control labels in the GWAS datasets. We generated 1,000 permutation datasets using the genotyping data, and computed module scores in each permutation dataset in the same way as for the real case. An empirical P value for each module was computed according to , where Zm (permutation) is the module score in the permutation data. A combinatorial set of criteria was defined to select modules: (1) P(Zm ) 0.185, a cutoff value that is halfway between the expected IBD for third- and second-degree relatives. We performed inverse-variance weighted meta-analysis based on the fixed-effects model using the tool meta (http://www.stats.ox.ac.uk/~jsliu/meta.html). This method combines study-specific beta values under the fixed-effects model using the inverse of the corresponding standard errors as weights. Between-study heterogeneity was tested based on I2 and Q statistics. SNPs with evidence of heterogeneity were removed. The three GWAS datasets were genotyped on the same platform; thus, we performed meta-analysis directly on the genotyped SNPs without imputation. Genomic control within each study was conducted in the meta-analysis using the lambda value to adjust the study-specific standard error (SE). Functional enrichment tests We performed pathway enrichment analysis by the IPA system (http://www.ingenuity.com) and also using canonical pathways from the KEGG database [9] by the hypergeometric test. The KEGG pathway annotations were downloaded in March 2011, containing 201 pathways with size ≥10 and ≤250. For each gene set collection, the results by the hypergeometric test were adjusted by the Bonferroni method for multiple testing correction. To further assess the significance of the identified gene sets, we performed empirical assessment of the significance by resampling 1000 times from the network genes, with each resample containing a random set of 205 genes. For a gene set S, we recorded the number of resamples in which the gene set was significant and computed an empirical P value by . Supporting Information Figure S1 Degree distribution of GAIN GWAS-weighted (top) and ISC GWAS-weighted (bottom) networks. Each node in the network was assigned to a degree bin based on its -log2(degree) value. (PDF) Click here for additional data file. Figure S2 Module size distribution of GAIN GWAS-weighted (top) and ISC GWAS-weighted (bottom) networks. (PDF) Click here for additional data file. Figure S3 Protein-protein interaction network consisting of module genes for schizophrenia. (PDF) Click here for additional data file. Table S1 Functional enrichment results using KEGG pathways for module genes. (DOCX) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Transcriptome Analysis Describing New Immunity and Defense Genes in Peripheral Blood Mononuclear Cells of Rheumatoid Arthritis Patients

              Background Large-scale gene expression profiling of peripheral blood mononuclear cells from Rheumatoid Arthritis (RA) patients could provide a molecular description that reflects the contribution of diverse cellular responses associated with this disease. The aim of our study was to identify peripheral blood gene expression profiles for RA patients, using Illumina technology, to gain insights into RA molecular mechanisms. Methodology/Principal Findings The Illumina Human-6v2 Expression BeadChips were used for a complete genome-wide transcript profiling of peripheral blood mononuclear cells (PBMCs) from 18 RA patients and 15 controls. Differential analysis per gene was performed with one-way analysis of variance (ANOVA) and P values were adjusted to control the False Discovery Rate (FDR<5%). Genes differentially expressed at significant level between patients and controls were analyzed using Gene Ontology (GO) in the PANTHER database to identify biological processes. A differentially expression of 339 Reference Sequence genes (238 down-regulated and 101 up-regulated) between the two groups was observed. We identified a remarkably elevated expression of a spectrum of genes involved in Immunity and Defense in PBMCs of RA patients compared to controls. This result is confirmed by GO analysis, suggesting that these genes could be activated systemically in RA. No significant down-regulated ontology groups were found. Microarray data were validated by real time PCR in a set of nine genes showing a high degree of correlation. Conclusions/Significance Our study highlighted several new genes that could contribute in the identification of innovative clinical biomarkers for diagnostic procedures and therapeutic interventions.
                Bookmark

                Author and article information

                Contributors
                Role: Graduate Student
                Role: Graduate Student
                Role: Graduate Student
                Role: Graduate Student
                Role: Professor
                Role: Associate Professor
                Journal
                Bone Joint Res
                Bone & Joint Research
                2046-3758
                July 2016
                31 August 2016
                : 5
                : 7
                : 314-319
                Affiliations
                [1 ]Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi’an Jiaotong University, Yanta West Road 76, Xi’an, Shaanxi, China
                Author notes
                Article
                10.1302_2046-3758.57.2000502
                10.1302/2046-3758.57.2000502
                5005471
                27445359
                d75f3cc2-fd6e-4215-946a-d74604490e3e
                © 2016 Guo et al.

                This is an open-access article distributed under the terms of the Creative Commons Attributions licence (CC-BY-NC), which permits unrestricted use, distribution, and reproduction in any medium, but not for commercial gain, provided the original author and source are credited.

                History
                : 16 July 2015
                : 7 June 2016
                Categories
                Research
                9
                Rheumatoid Arthritis
                Genome-Wide Association Studies
                Microarray
                Gene Modules
                Protein Interaction Network

                rheumatoid arthritis,genome-wide association studies,microarray,gene modules,protein interaction network

                Comments

                Comment on this article