The prognostic value of gene expression profiling (GEP) in multiple myeloma (MM) has
been reported by several groups.
1, 2, 3, 4
We have previously published a 70-gene classifier (GEP70) that identifies patients
with high risk for short progression-free survival (PFS) and overall survival (OS).
1
The GEP70 model was developed from data on patients enrolled in Total Therapy 2 (TT2).
1
Its discriminatory power has been validated in several published data sets in the
transplant, non-transplant and relapse settings (reviewed in Johnson et al.
5
). We applied the GEP70 model to 56 previously treated patients with available baseline
GEP information who were enrolled in Total Therapy 6 (TT6), a tandem transplant trial
the details of which are provided in Supplementary Methods. The gene expression profiles
have been deposited at the NCBI GEO data repository (http://www.ncbi.nlm.nih.gov/geo/)
under GEO accession number GSE57317. Sample procurement and processing for GEP, as
well as calculations of the GEP70 risk score, have been reported previously.
1
The estimated 1-year survival was 62% for the high-risk group and 97% for the low-risk
group by GEP70 (Supplementary Figure S1A, P<0.0001). To investigate whether this striking
difference in outcomes was driven by a few genes, all 70 probe sets of the GEP70 risk
model were ranked by their P-values, based on univariate Cox regression analysis for
OS in TT6 (Supplementary Table S1). The five probe sets with the smallest P-values
(ENO1, FABP5, TRIP13, TAGLN2 and RFC4) were combined to create a continuous score,
using methodology similar to that used to develop the GEP70 model.
1
Because each of the five probe sets had a positive association with short OS in TT6,
the GEP5 score was simply the mean of log2 transformed expression levels of the five
probe sets. An optimal cutoff for the new risk score (hereafter referred to as GEP5)
was then established with the running log-rank test, so that patients with scores
higher than the cutoff were deemed to have high-risk MM and others to have low-risk
(Figure 1a), with an estimated OS at 1 year of 60% and 95%, respectively (1-year PFS
50% and 91%, respectively).
All five genes identified in this study were previously reported to be involved in
cell proliferation and have been associated with development and survival in different
cancers. ENO1 encodes alpha-enolase. Initiation of translation at an alternative translation
start site results in a shorter isoform that produces MYC binding protein 1, which
acts as a transcriptional repressor and possibly as a tumor suppressor.
6
Overexpression of FABP5, a member of the family of fatty acid-binding proteins, was
associated with poor survival in triple-negative breast cancer and with resistance
to all-trans retinoic acid in a preclinical model of pancreatic ductal adenocarcinoma.
7,8
TRIP13 encodes a hormone-dependent transcription factor that interacts with the ligand-binding
domain of thyroid hormone receptors and may play a role in early-stage non-small-cell
lung cancer.
9
Association of TAGLN2 overexpression and short survival, metastasis and disease progression
has been shown for several cancers.
10,11
RFC4 encodes the 37-kDa subunit of the replication factor C protein complex, which,
together with the proliferating cell nuclear antigen, is required for DNA elongation.
12
Because the number of patients treated on TT6 was relatively small and follow-up short
(median follow-up 26.5 months), a larger data set of 275 uniformly treated patients
on TT3a with a longer follow-up was then used to investigate the new GEP5 score's
applicability to previously untreated myeloma. We validated the new GEP5 cutoff for
patients enrolled in TT3b (n=166).
13
Gene expression data for TT3a and TT3b have previously been published and are deposited
in the ArrayExpress archive (http://www.ebi.ac.uk/arrayexpress) under the accession
number E-TABM-1138. A new optimal cutoff for the GEP5 model of 10.68 was identified
from TT3a using the running log-rank statistics, which identified significant differences
in OS and PFS for the groups with high- and low-risk disease. Importantly, these differences
are comparable to those obtained by the GEP70 risk model with its established cutoff
1
(Figure 1b and Supplementary Figure S1B). In the validation cohort (TT3b), risk distinction
using GEP5 was very similar to GEP70 (Figure 1c and Supplementary Figure S1C) and
both were comparable to results in the TT3a training set. We also applied GEP5 to
a publicly available external data set of previously untreated patients (HOVON65/GMMG-HD4,
n=288)
4
as a second validation set, where GEP5 also differentiated between a high-risk and
a low-risk population with significantly different survival (Figure 1d).
In order to address the question whether the five probe sets in the GEP5 were truly
the best choice, we randomly selected 10 000 quintuplets from all the probe sets within
the 70 gene model to create 10 000 continuous scores using the same methodology as
for the GEP5 score. Among the 10 000 random scores tested, only 40 performed better
in TT6. Of these 40 only 1 performed better in the TT3 test set and none was superior
to GEP5 in the TT3b validation set (Supplementary Figure S2 and Supplementary Table
S2). We also examined randomly selected continuous scores in TT6 with probe sets ranging
between 1 and 10. Of a total of 42 485 models considered, only 1236 had a smaller
P-value than GEP5 in TT6. Among those 1236 scores, 68 had a smaller P-value when tested
in the TT3a test set and none performed better than GEP5 in the TT3b validation set
(Supplementary Figure S3 and Supplementary Table S3). Although some of these random
scores showed a better correlation with survival in single data sets, none were consistently
better than the GEP5 score across different data sets. The GEP5 always ranked among
the top 2% of all scores in all data sets analyzed (data not shown).
On multivariate stepwise analysis, the GEP5-defined high-risk designation was selected
as the most adverse variable linked to inferior PFS, with an estimated hazard ratio
of 3.44 (95% CI: 2.02–5.86), whereas the GEP70 model was selected for OS (Supplementary
Table S4). Table 1 summarizes the univariate survival analysis of the GEP5 and GEP70
models. Cross-tabulation of GEP70 and GEP5 risk (low vs high) for TT3A, and TT3B showed
an agreement rate between the two models of 0.89, and 0.87, respectively (Supplementary
Table S5).
GEP70 and GEP5 currently require the use of microarray technology that interrogates
the expression levels of more than 47 000 transcripts and variants simultaneously.
To assess whether a more targeted approach, only measuring the expression of a small
number of genes, could reliably predict risk in MM, we analyzed 48 RNA samples of
previously untreated patients on TT3a and TT3b with available GEP data using the nanoString
nCounter, with a code set consisting of all five genes (ENO1, FABP5, TAGLN3, TRIP13
and RFC4) of the GEP5 signature and the housekeeping genes RPL27, RPL30, RPS13, RPS29
and SRP14 (code set sequences are provided in Supplementary Table S6). Technical and
biological normalization were performed using the nSolver software provided by nanoString.
The correlation between microarray and nanoString-based gene expression for all five
genes was between r=0.64 and r=0.87. Using the normalized nanoString data, we computed
a nanoString-based GEP5 score (nsGEP5) applying the same methodology as for the microarray-based
GEP5. nsGEP5 and GEP5 correlated very well with r=0.852 (Supplementary Figure S4A).
The receiver operator curve revealed an area under the curve of 0.897, suggesting
that GEP5 high/low risk can be predicted using nsGEP5 (Supplementary Figure S4B).
In summary, high-risk myeloma remains one of the greatest therapeutic challenges.
The striking difference in survival of previously treated patients among GEP70 low-
and high-risk groups motivated our search for fewer responsible genes. We indeed identified
a set of five genes that are highly predictive of survival in multiple independent
data sets. The nsGEP5 based on targeted evaluation of the expression levels of these
five genes using the nanoString technology showed a very good correlation with GEP5
(based on microarray data). This new technology could reduce cost and sample requirements
and has the great potential of making gene expression-driven risk assessment available
to a broader patient population. However, the nsGEP5 will have to be evaluated in
an independent homogeneous set of clinical samples before it can be utilized in the
routine clinical setting. Recently a large-scale proteomics experiment involving 85
patients with MM identified ENO1, FABP5 and TAGLN2 among a set of 24 proteins that
are associated with short OS.
14
This set of 85 patients included 47 who were enrolled in TT3b. The correlation of
expression at both mRNA (via our GEP analyses) and protein levels supports the biological
relevance of the genes included in the GEP5 model. Work is in progress to identify
agents that can effectively target these prognostic genes.