Introduction
Ovarian cancer (OV) is a leading cause of cancer death among women [1]. It is characterized by high heterogeneity and relapse risk, particularly for high-grade serous ovarian cancer [2]. In 2022, approximately 19880 new cases were diagnosed, and 12810 deaths from the OV occurred in the United States [3].
OV is the most lethal gynecological malignancy, and OV tissues include both tumor and non-tumor cells in the microenvironment. The tumor microenvironment, the cellular environment in which the tumor exists, consists of immune cells, stromal cells, extracellular matrix components, and exosomes [4]. Among them, immune and stromal cells are major components. Growing evidence supports a key role of the tumor microenvironment in the growth, progression, and metastasis, and hence prognosis, of OV [5]. However, the effects of gene expression signatures associated with the OV microenvironment on prognosis remain unknown. On the basis of gene expression profiles, The Cancer Genome Atlas (TCGA) network has classified OV into four molecular subtypes: differentiated, immunoreactive, mesenchymal, and proliferative, which are associated with patient survival [6]. BRCA1 plays an important role in DNA-damage repair, chromatin remodeling, cell-cycle control, and transcriptional regulation, whereas BRCA2 acts mainly on homologous recombination [7]. Although many reports have described the relationship between BRCA1/BRCA2 mutations and OV prognosis, the prognostic value of BRCA1/BRCA2 mutations has not been well clarified.
The Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm has been proposed to score the fraction of immune and stromal cells in tumor tissues, then predict the infiltration level of immune and stromal cells and tumor purity, on the basis of gene expression signatures from TCGA database [8]. ESTIMATE is a tool for predicting tumor purity and the presence of infiltrating stromal/immune cells in tumor tissues on the basis of gene expression data. The algorithm is based on single-sample gene set enrichment analysis and generates three scores: (1) an immune score that represents the infiltration of immune cells in the tumor tissue, (2) a stromal score that captures the presence of stroma in tumor tissue, and (3) an estimate score that indicates tumor purity. Although several reports have used the ESTIMATE algorithm in various tumors, such as breast cancer [9], head and neck squamous cell carcinoma [10], glioblastoma multiforme [11], and gastric cancer [12], its application in OV has not been studied in detail.
In this study, we applied the ESTIMATE algorithm to identify several tumor-microenvironment-associated genes that predicted outcomes in patients with OV. Notably, we further validated these genes in a different OV cohort from the Gene Expression Omnibus (GEO) database and immunohistochemistry (IHC) data from The Human Protein Atlas (THPA) database.
Materials and methods
Data preparation
Level 3 gene expression profile data based on the Affymetrix HT Human Genome U133a (HT-HG-U133A) microarray platform for OV samples were downloaded from TCGA data coordination center (https://tcga-data.nci.nih.gov/tcga/) on September 8, 2017. Clinical data for the OV samples, including sex, age, histological type, clinical stage, histological grade, molecular subtype, somatic mutation status of BRCA1/BRCA2, and survival information were also obtained from TCGA database. The immune and stromal scores of 469 OV samples were available based on the ESTIMATE. An independent dataset comprising data from 40 patients with OV from series GSE32063 was used for external validation in this study.
Identification and analysis of differentially expressed genes
According to the immune and stromal scores obtained from the ESTIMATE algorithm, all OV samples were divided into two groups with a high or a low (immune or stromal) score. The R package limma was used for the identification of differentially expressed genes (DEGs) [13]. The DEGs between two groups (high group vs. low group) were screened on the basis of the following cutoffs: false discovery rate (FDR)-adjusted p-value < 0.05 and fold change (FC) > 1.5. Upregulated DEGs denoted the genes that were upregulated in the high group compared with the low group, whereas downregulated DEGs denoted the genes that were downregulated in the high group compared with the low group.
Heatmaps and clustering analysis
ClustVis, a web tool, was used to produce heatmaps and principal component analysis (PCA) plots of DEG distribution between groups with high vs. low immune (or stromal) scores [14].
Construction of volcano plots and Venn diagrams
A volcano plot is a scatter diagram that combines measures of statistical significance (such as p value) and magnitude of change in statistical tests to enable rapid visual identification of data points (e.g., genes) with high statistical significance. Volcano plots were used for determining the distribution of upregulated and downregulated DEGs. Venn diagrams have been widely used in various types of data analysis. In microbiome studies, they are commonly used to analyze shared and unique species between samples (groups). The upregulated DEGs and downregulated DEGs shared between the immune score group and stromal score group were identified with Venn diagrams (Venn 2.1.0).
Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis of DEGs
Gene Ontology (GO) function (biological processes (BP), molecular functions (MF), and cellular components (CC)) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of DEGs were assessed with the Database for Annotation, Visualization and Integrated Discovery (DAVID) [15]. An FDR-adjusted p-value < 0.05 was considered statistically significant.
Construction of a protein-protein interaction network
The protein-protein interaction (PPI) network of upregulated DEGs in the stromal group was constructed with the STRING database [16] and Cytoscape software 3.6.1 [17].
Overall survival curves
The relationship between gene expression levels of DEGs and overall survival (OS) in patients with OV was analyzed with Kaplan-Meier survival curves with a log-rank test.
Validation of prognostic DEGs with IHC data from THPA database
The public database THPA (https://www.proteinatlas.org) can be used to verify the expression of target genes [18]. We observed protein expression differences in prognostic DEGs between OV and normal tissues, on the basis of IHC data from THPA database.
Statistical analyses
All statistical analyses were performed in GraphPad Prism 5 (Version 5.01) and R version 3.5.1. The data analyses in this study were performed with standard statistical tests whenever appropriate. Comparisons between groups were evaluated with unpaired t test and one-way analysis of variance. p < 0.05 was considered statistically significant.
Results
Correlation of immune and stromal scores with clinicopathologic characteristics
The gene expression profiles and clinical data for 469 patients with OV pathologically diagnosed between 1992 and 2009 were downloaded from TCGA database. Detailed clinicopathologic features are shown in Table 1 . An overall flowchart of this study is shown in Figure 1 . The histology type of all patients with OV obtained from TCGA database in this study was ovarian serous cystadenocarcinoma. The treatment strategies for the patients with OV mainly included surgery and combination chemotherapy; 26 patients received additional radiotherapy. The cohort included 63 (13.4%) cases of differentiated subtype, 80 (17.1%) cases of immunoreactive subtype, 64 (13.6%) cases of mesenchymal subtype, 74 (15.8%) cases of proliferative subtype, and 188 (40.1%) cases of unknown molecular subtype. The ESTIMATE algorithm was used to evaluate the infiltration of immune and stromal cells, and tumor purity. According to the ESTIMATE algorithm, the immune scores ranged from -1498.58 to 2774.16, and the stromal scores were between -1988.05 and 1837.43. The immune scores of patients were higher than the stromal scores in the entire OV cohort. The immunoreactive subtype showed the highest average immune scores among the four subtypes, and was followed by mesenchymal subtype, differentiated subtype, and proliferative subtype ( Figure 2A , p < 0.0001). Similarly, the mesenchymal subtype had the highest average stromal scores, and was followed by the immunoreactive subtype, differentiated subtype, and proliferative subtype ( Figure 2B , p < 0.0001). Patients with the proliferative subtype had the lowest immune scores and stromal scores. These results suggested that both immune and stromal scores clearly correlated with the classification of molecular subtypes in patients with OV.
Characteristics | TCGA Cohort (n = 469) | GSE32063 Cohort (n = 40) |
---|---|---|
Age | ||
<40 years | 7 (1.5%) | 1 (2.5%) |
40–60 years | 241 (51.4%) | 21 (52.5%) |
≥60 years | 221 (47.1%) | 18 (45.0%) |
Molecular subtype | ||
Differentiated subtype | 63 (13.4%) | |
Immunoreactive subtype | 80 (17.1%) | |
Mesenchymal subtype | 64 (13.6%) | |
Proliferative subtype | 74 (15.8%) | |
Unknown | 188 (40.1%) | 40 (100%) |
| ||
Histological Type | Ovarian Serous Cystadenocarcinoma | Ovarian Serous Cystadenocarcinoma |
| ||
Histological grade | ||
G1 (well differentiated) | 2 (0.4%) | |
G2 (moderately differentiated) | 57 (12.2%) | 23 (57.5%) |
G3 (poorly differentiated) | 400 (85.3%) | 17 (42.5%) |
G4 (undifferentiated) | 1 (0.2%) | |
Gx (grade cannot be assessed) | 8 (1.7%) | |
Unknown | 1 (0.2%) | |
Tumor stage | ||
II | 25 (5.3%) | |
III | 365 (77.8%) | 31 (77.5%) |
IV | 75 (16.0%) | 9 (22.5%) |
Unknown | 4 (0.9%) | |
BRCA1-mutant status | ||
Wild-type BRCA1 | 287 (61.2%) | |
Mutant BRCA1 | 10 (2.1%) | |
Unknown | 172 (36.7%) | 40 (100%) |
BRCA2-mutant status | ||
Wild-type BRCA2 | 288 (61.4%) | |
Mutant BRCA2 | 9 (1.9%) | |
Unknown | 172 (36.7%) | 40 (100%) |
Vital status | ||
Dead | 295 (62.9%) | 22 (55.0%) |
Alive | 174 (37.1%) | 18 (45.0%) |
In this study, ten patients had BRCA1 mutations with OV, 287 had wild-type BRCA1, and 172 patients had unknown status. In addition, nine patients had BRCA2 mutation, 288 had wild-type BRCA2, and 172 patients had unknown status. We determined the distribution of immune and stromal scores between BRCA1/2-mutant and wild-type patients with OV ( Figure 2C–F ). However, no significant differences were found in the immune and stromal scores between BRCA1/BRCA2mutant and wild-type patients.
Correlation of immune and stromal scores with OV prognosis
To better understand the correlation of immune and stromal scores with OV prognosis, we categorized the 469 patients with OV into groups with high and low scores, according to the top half of 235 samples with higher immune (or stromal) scores and the bottom half of 234 samples with lower immune (or stromal) scores, respectively. Patients with low vs. high immune scores showed no difference in median survival (p = 0.5707, Figure 2G ). The median survival of patients with high stromal scores was lower than that of patients with low scores, on the basis of the Kaplan-Meier curve, although the difference was not statistically significant (p = 0.1145). This trend was reflected in the 5-year OS of patients with OV (p = 0.0376, Figure 2H ).
Identification and analysis of DEGs on the basis of immune and stromal scores
To explore the association of gene expression signatures with immune and stromal scores, we compared gene expression profiles of 469 OV samples downloaded from TCGA database. Heatmaps and PCA plots were used to visualize the distribution of gene expression profiles between high and low immune (or stromal) score groups in Figure 3A–D . A total of 487 genes were identified to be DEGs in the immune score group; 442 upregulated DEGs and 45 downregulated DEGs were identified in the groups with high compared with low immune scores, on the basis of p < 0.05 and FC > 1.5. In contrast, 449 genes were identified to be DEGs in the stromal score group. Of these, 428 upregulated DEGs and 21 downregulated DEGs were identified in the group with high compared with low stromal scores, on the basis of on p < 0.05 and FC > 1.5. Similarly, volcano plots ( Figure 3E, F ) showed the distribution of upregulated and downregulated DEGs in the immune score and stromal score groups. Venn diagrams indicated 287 upregulated DEGs and 13 downregulated DEGs shared between the immune score and stromal score groups ( Figure 3G, H ). Because only stromal scores showed a distinct association with OV prognosis, we focused on DEGs grouped by stromal score in the subsequent analysis.
GO function and KEGG pathway enrichment analysis of the identified DEGs
To predict the potential function of 428 upregulated DEGs in the stromal score group, we performed GO and KEGG pathway enrichment analysis of these genes. We identified 355 GO terms of BP, 64 GO terms of CC, and 76 GO terms of MF with significant differences (p < 0.05). The top ten GO terms were identified, including immune response, extracellular space, and extracellular matrix structural constituent ( Figure 4A–C ). In addition, a total of 54 KEGG pathway categories were significant (p < 0.05). The top ten KEGG pathway categories were identified, including Staphylococcus aureus infection, phagosome, and extracellular matrix–receptor interaction, which were associated with immune responses ( Figure 4D ).
Correlation of DEG expression with OV prognosis
To estimate the possible role of DEGs in OS, we plotted Kaplan-Meier survival curves with the log-rank (Mantel-Cox) test. Among the 428 upregulated DEGs in the stromal score group, a total of 44 DEGs were significantly associated with OV prognosis. Of these 44 DEGs, 26 DEGs (hazard ratio (HR) > 1) clearly predicted poor outcomes for OV (p < 0.05; selected genes shown in Figure 5A–F ).
PPI network of prognostic DEGs
To functionally explore the interactions among these prognostic DEGs, we constructed a PPI network, which consisted of 41 nodes, with the STRING database and Cytoscape software. We defined the node color continuously according to log (FC) values of prognostic DEGs in the network. Similarly, we identified the node size continuously on the basis of the degree value, which represented the number of connections to other nodes. In addition, we used Molecular Complex Detection (MCODE) software in Cytotype to conduct module analysis based on the PPI network constructed above. The most important module was chosen for further research. In this PPI network, MMP9, CXCL10, GZMB, CXCL9, CXCL11, CXCL13, C5AR1, GBP1, CD2, GBP2, and CX3CR1 were the notable nodes with higher degree values, because they showed strong associations with other nodes ( Figure 6 ).
Validation of prognostic DEGs in an external cohort from the GEO database and IHC from THPA database
To determine whether the prognostic DEGs identified from TCGA database also had significant prognostic value in a different OV cohort, we obtained and analyzed gene expression signatures of 40 patients with OV from series GSE32063. As shown in Table 2 , six genes (CH25H, CX3CR1, GFPT2, NBL1, TFPI2, and ZFP36) were verified to be significantly correlated with poor outcomes (p < 0.05, Figure 7A–F ). Moreover, we used IHC from THPA database to assess the protein expression differences in these six genes. As shown in Figure 8A–J , the protein expression of CX3CR1, GFPT2, NBL1, TFPI2, and ZFP36 in OV tissues was upregulated in OV compared with normal tissues, in agreement with our previous results. However, we did not find protein expression differences in CH25H in the THPA database.
Discussion
Because the early symptoms and signs of OV are atypical or non-existent, most patients are initially diagnosed in an advanced stage with low 5-year survival rates [19]. Therefore, new methods to screen for early OV and new treatment strategies must be developed. Given the effects of the tumor microenvironment on OV, prognostication is crucial [20]. A better understanding of interaction between tumor-microenvironment components and OV prognosis is urgently needed. In this study, we identified several genes associated with the tumor microenvironment that predicted poor outcomes in patients with OV from TCGA database.
The mesenchymal subtype shows overexpression of several genes associated with the activated stroma at the molecular level [21], although the stromal cells of the mesenchymal subtype have not been well described. Stromal components have been recognized not only to be a scaffold for integrity of tissue structure, but also play a role in tumorigenesis, growth, invasion, and metastasis [22, 23]. In this study, the molecular subtypes of patients with OV significantly correlated with the stromal scores, which were highest in the mesenchymal subtype. We further compared the median survival of the groups with high and low stromal scores, and found that high stromal scores were associated with low median survival, thus indicating poor prognosis. Therefore, our results demonstrated that the mesenchymal subtype is associated with poor outcomes, in agreement with findings from previous reports [24]. Stromal components in the tumor microenvironment may significantly contribute to the behaviors of OV with mesenchymal subtype.
The prognostic value of BRCA1/BRCA2 mutations in OV remains unclear. Previous report have indicated better prognosis in patients with BRCA2 mutation, but no significant difference in prognosis for patients with BRCA1 mutation compared with wild-type patients [25]. However, several studies have shown that patients with BRCA1 and BRCA2 mutations have better outcomes than wild-type patients [26, 27], whereas other studies have demonstrated no significant difference [28]. Our study also indicated no significant prognostic difference, but the small sample sizes of BRCA1/BRCA2 mutations might have led to inaccurate estimates of survival.
Increasing evidence indicates that tumor development is influenced not only by internal tumor cells but also by external tumor-microenvironment components [29–32]. However, these studies have paid limited attention to stromal components in the tumor microenvironment. In this study, in the gene expression profiles of 469 OV samples, we first analyzed 449 DEGs identified from a comparison between groups with high and low stromal scores, some of which participated in the tumor microenvironment, as shown by GO function and KEGG pathway enrichment analysis. These findings were consistent with those from previous studies indicating that stromal components correlate with the formation of the tumor microenvironment in OV [33–35].
We performed survival analysis of these 449 genes and determined that 44 DEGs were significantly associated with OV prognosis, 26 of which clearly predicted poor prognosis. In addition, we constructed a PPI network of 44 DEGs and found that these genes were highly interrelated. CX3CR1 [36], CRYAB [37, 38], TGFBI [39], TFPI2 [40], and GFPT2 [41] have been reported to play crucial roles in stimulating tumor proliferation, invasion, and metastasis in patients with OV, thereby predicting poor prognosis.
We further validated six tumor-microenvironment-associated genes (CH25H, CX3CR1, NBL1, TFPI2, GFPT2, and ZFP36) significantly associated with poor outcomes in patients from a different OV cohort from the GEO database. The survival rate of patients with high expression of tumor-microenvironment-associated genes (CX3CR1, GFPT2, NBL1, TFPI2, and ZFP36) was significantly lower than that of patients with low expression, thus suggesting that these genes may serve as biological targets to improve the prognosis of patients with OV. In addition, the protein expression of CX3CR1, GFPT2, NBL1, TFPI2, and ZFP36 was upregulated in OV tissues compared with normal tissues, on the basis of IHC data from the THPA database, thus strongly supporting our conclusions. As described above, among these six genes, the expression of CX3CR1 [36], TFPI2 [40], and GFPT2 [41] was associated with poor prognosis in patients with OV. Although the remaining three genes, CH25H, NBL1, and ZFP36, have not previously been correlated with OV prognosis, our results suggest that these genes may be potential prognostic biomarkers for OV. The endoplasmic-reticulum-associated membrane protein CH25H has been shown to inhibit infection by several viruses through catalyzing cholesterol conversion into 25-hydroxycholesterol [42]. Overexpression of NBL1 has been demonstrated to suppress tumor growth [43]. Furthermore, ZFP36 has been reported to play a key role in post-transcriptional regulation of tumor necrosis factor and the modulation of mRNA stability [44]. Thus, some previously overlooked genes may also be potential prognostic biomarkers for OV.
Although Mairinger Fabian et al. [45] have identified tumor-microenvironment-associated genes in patients with OV, significant differences exist between that study and our study in terms of purpose, methods, and results. Mairinger Fabian et al. aimed to examine the putative role of the genetic tumor immune microenvironment in mediating differential chemotherapy response in patients with high-grade serous ovarian cancer. Furthermore, the investigators did not apply the ESTIMATE algorithm to evaluate the stromal scores and immune scores of patients with OV from TCGA database. The present study has an advantage in that it indicates the importance of tumor-microenvironment-associated genes as biological targets for OV prognosis and treatment by integrating a large OV tissue microarray with bioinformatics.
This study also has several limitations. First, few genes were associated with OV prognosis, possibly because the small sample size of OV cohort for validation. Second, this was a retrospective study using public databases. Because the database used for this study was from the United States and the validation database from Japan, there was regional differences between the two studies. Therefore, well-designed, prospective clinical studies combined with IHC and polymerase chain reaction analysis remain imperative to further verify the prognostic value of these tumor-microenvironment-associated genes, which may aid in the development of novel prognostic biomarkers and therapeutic targets for OV in clinical practice.
In conclusion, our findings suggest that tumor-microenvironment-associated genes (CX3CR1, GFPT2, NBL1, TFPI2, and ZFP36) may serve as promising biomarkers for OV prognosis, with clinical implications for therapeutic strategies.