Introductory paragraph
Spatial heterogeneity of transcriptional and genetic markers between physically isolated
biopsies of a single patient’s tumor poses major barriers to the identification of
biomarkers, and the development of targeted therapies effective against the entire
tumor. We analyzed spatial heterogeneity data from 35 patients with multi-regional
biopsies using a combination of transcriptomic and genomic profiles. Medulloblastomas,
but not malignant gliomas, demonstrate spatially homogeneous transcriptomes, allowing
accurate subgrouping of tumors from a single biopsy. Conversely, somatic mutations
that impact genes suitable for targeted therapeutics demonstrate high levels of spatial
heterogeneity in medulloblastoma, malignant glioma, and renal cell carcinoma. Actionable
targets found in a single medulloblastoma biopsy are seldom clonal across the entire
tumor, questioning the efficacy of monotherapy against a single target. Clinical trials
of targeted therapies for medulloblastoma should first assure the spatially ubiquitous
nature of the target mutation.
Letter
Many cancer types show considerable intertumoral heterogeneity between patients
1–3
. Molecular biomarkers are intended to (i) tailor treatment intensities
4,5
, (ii) define oncogenic drivers for targeted therapies
5–7
, and (iii) identify diagnostic mutations (i.e. SMARCB1 in ATRT)
8
. Currently, clinical diagnoses are based on single biopsies due to the assumption
of spatial homogeneity across tumors, however, spatial heterogeneity could lead to
erroneous tumor classification, or selection of therapies against targets only present
in a locally-restricted subset of the tumor. These implications were recently highlighted
in late-stage renal cell carcinoma (RCC)
9,10
with highly divergent mutational profiles affecting MTOR and TP53, as well as demonstrating
good and poor prognostic gene signatures in multi-region biopsies derived from the
same tumor
10,11
.
To determine the degree and clinical importance of spatial heterogeneity in medulloblastoma,
we performed multi-regional biopsies and compared gene expression profiles, DNA copy
number aberrations, and somatic mutations. Our cohort includes 9 primary medulloblastomas,
16 high-grade gliomas (HGG) (10 with gene expression only
12
), and 10 kidney cancers
10
, each with 4–11 spatially distinct biopsies (median=6). An overview of the data types
available for each patient is presented in Supplementary Table 1a and Supplementary
Figure 1.
Glioblastoma
13
and medulloblastoma
14
each comprise four distinct molecular subgroups that are discerned through analysis
of transcriptional data. Unsupervised hierarchical clustering (HCL) of expression
data demonstrates that medulloblastoma biopsies form tight clusters apart from single
samples
15–20
(8/8; Fig. 1a, Supplementary Fig. 2a–b), whereas in HGGs (3/3) and RCCs (8/9), multi-region
biopsies from single individuals cluster apart when combined with single samples (Supplementary
Fig. 2c–f,). Overall, based on the standard deviation of expression, inter-tumoral
differences were greater than intra-tumoral heterogeneity in each tumor type (Fig.
1b). Using Predictive Analysis of Microarrays (PAM), subtype prediction revealed that
21% (13/63) of glioblastoma multi-region samples diverged from the most commonly observed
subtype for each patient, compared to only 2% (1/52) of medulloblastoma biopsies (p=0.003;
Fig. 1c–e; Supplementary Figs. 3–6). Considering only biopsies with subgroup predictions
of 100% confidence, all MB tumors had concordant subgroup calls between multiple biopsies
(9/9) compared to only 55% of glioblastomas (6/11; p=0.038; Fig. 1e). We conclude
that medulloblastoma can be robustly and reliably sub-grouped from only a single biopsy,
but glioblastoma cannot.
We identified somatic copy number aberrations (CNA) using a custom pipeline based
on the TITAN algorithm
21
, which is robust to high levels of normal contamination (see Methods). Regions of
CNA were identified in all three tumor entities (Fig. 2a, Supplementary Fig. 7–8,
Supplementary Table 1b–c), and unsupervised HCL of clonal segments showed tight clustering
of individual biopsies in the cohort across all tumor entities (Fig. 2b, Supplementary
Fig. 9). CNA-derived measurements of spatial heterogeneity reveal variance between
individuals within each tumor type (Fig. 2c). Somatic single nucleotide variants and
indels recapitulate a similar pattern of spatial heterogeneity across tumors (Fig.
2d; Supplementary Table 1d). Overall, based on mutation and copy number data, none
of the three tumor types were solely comprised of only homogeneous or heterogeneous
tumors, rather, each have a repertoire of tumors that reside along a continuum of
genetic heterogeneity.
This genomic complexity results from a process of clonal evolution whereby successive
acquisition of mutations and copy number events generates genetically related subpopulations
of cells or lineages within each tumor. We integrated CNA and mutational data using
the EXPANDS algorithm
22
, in order to infer the cellular lineage composition in each biopsy. EXPANDS detects
multiple genetically distinct co-existing subpopulations of cells and allows phylogenetic
reconstruction of their evolutionary relationships. A cartoon describing the spatial
distribution of genetically distinct subpopulations throughout a tumor (Fig. 3a) illustrates
the clonal intermixing detected in many samples of the cohort (Fig. 3b–d; Supplementary
Fig. 10; Supplementary Table 1e–f). Many tumor biopsies have a major clone (genotype
present in >70% of tumor cells) that is also detected in a minority of cells in other
biopsies from the same tumor (i.e. are subclonal), or which is absent in other biopsies
(e.g. biopsies 3,5,6 in RCC7 are genetically similar to some cells in biopsy 4 (4a),
but not all cells, since 4b clusters separately; Fig. 3c). In some tumors, individual
biopsies contain two or more cell lineages that independently accumulate distinct
repertoires of mutations not found elsewhere in the tumor (e.g. HGG2 biopsies 1,5;
Fig. 3c). The presence of multiple genetically distinct cellular lineages within single
biopsies has previously been linked to poor prognosis and treatment response across
a variety of cancer types
23
.
This surprising but common pattern of major genetic clones in one biopsy that are
subclonal or absent in spatially distinct locations in the tumor prompted us to investigate
observable mutation clonality across biopsies, since clonality is a key requirement
of clinically-actionable therapeutic targets
24
. We classified mutations into clonal and subclonal populations (Supplementary Fig.
11; Supplementary Table 1g), and determined whether the subset of damaging clonal
mutations changes status between spatially-separated tumor biopsies. In nearly all
tumors we find a predominance of clonal mutations that are subclonal or completely
absent in additional biopsies (Fig. 4a; Supplementary Fig. 12; validation set of 7
mutations with 96% validation rate across biopsies; Supplementary Fig. 13 and Supplementary
Table 1h). This observation remains true when considering only driver events
25,26,27,28
(Fig 4b; Supplementary Table 1i–k). We predict that monotherapies against a single
target identified in a single biopsy are unlikely to show dramatic clinical effects
as the targets are not ubiquitous, leaving untargeted clones in unsampled portions
of the tumor free to survive and repopulate the tumor.
With the goal of improved patient treatment, the clinically relevant question is whether
the observed level of genomic spatial heterogeneity affects actionable or driver alterations.
As a proof of concept, we focused on a set of genes with known roles in cancer initiation/progression
29
, or which have defined drug interactions
30
. These genes are therefore enriched in relevant or actionable targets in a manner
unbiased towards either of the cancer types included (Supplementary Table 1l–m). Considering
the spectrum of SNVs, indels, and CNAs affecting these genes (Supplementary Fig. 14–15),
we found a remarkable variety of patterns across tumors, including: cases with only
a small set of shared alterations across biopsies but with many events present in
single biopsies (e.g. HGG4 MET amplification); homogeneous tumors with many shared
actionable events (e.g. HGG3); cases without ubiquitous actionable targets, which
may require multi-agent targeted therapeutics (e.g. MB6); tumors lacking vulnerability
to any of the considered actionable targets in a subset of biopsies (e.g. MB7); and
tumors with alterations that may predict resistance (e.g. RCC7 TP53 compound loss
and somatic mutation).
Considering the full set of identified actionable mutations per tumor across all biopsies,
we calculate that in each entity, an average of at least 5 biopsies are required to
have an 80% chance of identifying at least 80% of these alterations. Lowering these
measures to 50% would require sampling of at least 2 biopsies, with highly heterogeneous
tumors needing as many as 4 (Fig. 5a). This is likely an underestimate, as the detection
of actionable mutations does not plateau in most patients (Supplementary Fig. 16).
Upfront profiling of numerous tumor regions in order to reveal the full repertoire
of actionable targets is neither practical nor likely, given the amount of sequencing
required, thus we focused on maximizing the information derived from a minimal set
of biopsies. Specifically, we wanted to determine how well we could predict the frequency
of individual mutations across a tumor, given increasing number of biopsies, noting
that prediction accuracy for mutations identified in a single fraction would be high
only in very homogeneous tumors. We empirically determined the frequency of each alteration
when considering all possible pairs of an increasing number of biopsies, and compared
this observed quantity to the known frequency of the alteration in all biopsies; the
difference between these values being the inference error of mutation frequency resulting
from insufficient biopsies in genetically heterogeneous tumors (Supplementary Fig.
17). Taking a 10% error rate as an acceptable threshold, we calculate for each tumor
the number of observed mutation frequencies that fall within this range (i.e. accuracy).
As expected, accuracy improves with increasing number of biopsies, and also reveals
that brain tumors fall into two patterns. The first comprises more homogeneous tumors,
which have fairly high prediction accuracy even with a low number of biopsies, while
the second comprises more heterogenous tumors that require multiple biopsies to ensure
accurate determination of mutation frequency (Fig. 5b). In our cohort of medulloblastomas
and glioblastomas, considering just two biopsies per tumor enables distinction of
these high vs low genetic heterogeneity tumors, with high specificity especially for
those tumors that are highly heterogeneous (Fig. 5c; Supplementary Fig. 18).
While spatial heterogeneity is clearly a barrier to highly effective therapeutics
against the entire primary tumor, the extent of heterogeneity between a primary and
recurrent medulloblastoma
31
is many fold greater (Fig. 6a, Supplementary Fig. 19). This vast discordance at relapse
is therefore unlikely to be secondary solely to inadequate spatial sampling of the
therapeutically-naïve primary tumor. In gliomas
32
the recurrent disease more closely resembles the primary and only in rare cases diverges
to the extent seen in medulloblastoma, possibly due to less complete resection success
in this more diffuse and infiltrating tumor. Medulloblastoma is known to recur from
very rare populations of cells
31
, thus, therapeutic approaches that can eradicate such cellular lineages despite their
low prevalence in the primary tumor are drastically needed.
Immunotherapy relies on the destruction of cancer cells based on the presence of tumor-specific
cell-surface antigens as opposed to cell autonomous somatic mutations. We examined
expression of the antigens/genes for which chimeric antigen receptor T-cells or monoclonal
antibodies already exist
33–43
, and demonstrate a remarkable consistency of expression across multi-regional biopsies,
which contrasts highly to the inhomogeneity of somatic mutations across fractions
in the same set of tumors. This was the case in all medulloblastomas examined, including
those with high levels of genetic heterogeneity and in which targeted therapy would
be problematic (Fig. 6b; Supplementary Fig. 20)
33–43
. Homogeneity of the transcriptome versus heterogeneity of somatic mutations in our
MB cohort suggests that targeted immunotherapeutic approaches could overcome the hurdle
of spatial genetic heterogeneity.
The vast majority of brain tumor patients have their tumor classified from a single
tumor biopsy, which may be adequate for medulloblastoma, but not for glioblastoma
patients. The extent of spatial heterogeneity of somatic mutations observed in our
cohort suggests that clinical trials of molecularly targeted therapy should first
assess the ubiquitous distribution of the target. The lack of clonal actionable driver
mutations ubiquitously present across all regions of a given brain tumor suggest that
monotherapies targeting a single gene from a single biopsy are unlikely to have dramatic
effects to improve the lives of brain tumor patients.
Materials and Methods
Patients and samples
Multi-region tumor biopsies and clinical data were gathered for 35 tumors (9 primary
medulloblastomas, 16 high-grade gliomas (HGG) (10 with gene expression only
12
), and 10 kidney cancers
10
); peripheral blood samples were included as germline (GL) controls for all cases
with exome sequencing. All multi-region biopsies for unpublished cases were obtained
in situ during tumor resection, by mimicking the previous sample preparation conditions
of published cases to the best of our knowledge. Medulloblastoma tumors are similar
in size to glioblastomas, with an average diameter of 8–12cm; biopsies were taken
as far apart as possible by the surgeon. Renal cancers, due to their localization
in the abdomen may be larger in size. Detailed information on multi-region tumor samples
is provided in Supplementary Table 1a and Supplementary Figure 1. All patient material
and clinical information were obtained after receiving informed consent and was approved
by the institutional review boards of the contributing institutions. DNA and RNA extractions
were performed as previously described
16
. RNA quality was assessed on a 2100 Bioanalyzer (Agilent Technologies, Santa Clara,
CA). Only high-quality RNA (RNA integrity number ≥7) was included for further studies.
Gene expression profiling
Expression profiling was conducted on eight MB and three HGG multi-region biopsies
totaling 72 biopsies with a median number of 6 multi-region biopsies per primary (range
4–9). Affymetrix HU133 Plus 2.0 microarrays were used for HGG samples and Affymetrix
Gene 1.1 ST array (Affymetrix, Santa Clara, CA) was used for MB samples to ensure
that these multi-region biopsies could be compared to published data sets
15–17,20
. Microarrays were processed according to the manufacturer’s guidelines. Raw data
was normalized using a transcript-level robust multi-array average (RMA) algorithm
44
, and subsequently clustered using unsupervised HCL (Pearson’s Dissimilarity – average
linkage) in Partek Genomics Suite. The molecular classification of the multi-region
biopsy samples was performed using a class prediction algorithm, Prediction Analysis
for Microarrays (PAM)
45
, as implemented in the pamr package (v 1.51). Markers for GBM subtypes were obtained
from the Verhaak classifier
13
. We note that classification was performed for the GBM samples only, thus excluding
HGG1. Subgroup-specific markers for MB were identified based on one-way analysis of
variance (ANOVA) with multiple hypothesis correction by the Bonferroni method in previously
published datasets with known subgroup affiliation
46
. Based on the misclassification error values in core GBM
13
and MB
15–17
training datasets (Supplementary Fig. 6), threshold values of 1.75 and 1 were chosen
for multi-region samples from published
12
and unpublished GBM and MB patients, respectively. The published GBM dataset
12
was quantile normalized using Partek Genomics Suite. Predicted subtypes or subgroups
with confidence probabilities higher than established thresholds
46
were considered bona fide subgroup assignments. Samples with less than 500 ng of remaining
RNA were analyzed using NanoString as previously described
46
. MB3 was exclusively analyzed using NanoString since only limited amounts of RNA
were available for all multi-region biopsies. NanoString counts were normalized to
the three housekeeping genes (GAPDH, ACTB and LDHA). Dot plots and PCA based on normalized
NanoString calls were prepared using the R-statistical environment (v2.15.1). Pearson
correlation was used to determine the correlation of marker gene expression for each
biopsy per patient (intra-tumor comparison) and between each biopsy and all others
samples from different patients of the same subgroups (inter-tumor comparison). Wilcoxon
rank sum test was used to infer intra- and inter-tumor marker gene expression differences
in a subgroup-specific fashion.
A previously published dataset of nine multi-region RCC samples
9
profiled using the Affymetrix Human Gene 1.0 ST array was included in the analysis
as well as two RCC datasets
18,19
with 53 and 29 single RCC samples, respectively. The RCC expression datasets have
been processed together in R (v3.1.1) with the oligo package (rma normalization) and
the combat package has been used for batch effect correction. Unsupervised HCL (Pearson’s
Dissimilarity – average linkage) has been performed using the Partek Genomics Suite.
The Gene Expression Omnibus accession numbers for the previously-unplublished gene
expression data are GSE62802 (HGG samples) and GSE62803 (MB samples).
Whole-Exome Sequencing
DNA libraries (MB1-5) from multi-region samples were exome captured using Agilent
SureSelect V5+UTR probes, followed by 8 cycles of PCR and then paired-end 75 base
reads were sequenced over 2 lanes on an Illumina HiSeq 2000 instrument per pool of
6 libraries. Reads were aligned to the human reference genome hg19a using the Burrows-Wheeler
Aligner (BWA) (version 0.5.7)
47
. Two lanes were merged with duplicates marked using Picard Tools (version 1.71).
Additional samples (MB6/7, HGG1-5) were subjected to paired-end library construction
using Illumina’s Nextera Rapid Capture Exome (RCE) kit. Captured exome DNA sequences
were then sequenced on Illumina HiSeq 2000 (rapid-run mode) for 100-bp paired-end
reads. We used FASTX toolkit to remove adaptor sequences and to trim low quality reads.
Quality trimmed reads were then aligned to the human reference genome (hg19) using
BWA (version 0.5.9)
47
. Genome Analysis Toolkit (GATK)
48
was used for indel realignment. Duplicate reads were then marked using Picard to be
able to exclude them further in our analysis. The Toronto datasets are available under
accession numbers EGAD00001000723 and EGAS00001001014.
Somatic SNV detection and filtering
SNVs were called exome-wide using samtools mpileup (v0.1.7), and indels were called
using VarScan. Stringent filtering was performed requiring no reads in the germline
sample supporting a SNV to ensure conservative selection of somatic events. Variants
with sufficient coverage (≥10) were further annotated using Annovar
49
(table_annovar.pl; RefSeq gene annotations, amino acid change annotation, SIFT, PolyPhen,
LRT, and MutationTaster scores, PhyloP and GERP++ conservation scores, dbSNP identifiers,
1000 Genomes Project allele frequencies, NHLBI-ESP 6500 exome project allele frequencies).
Mutation Validation
A subset of somatic mutations were validated using PCR amplification from all tumor
biopsies, matched germline, and a healthy control sample. Regions of interest were
amplified from genomic DNA using primers flanking each SNV (Supplementary Table 1h,n)
using Q5 High-Fidelity DNA polymerase (NEB). PCR specificity was determined by agarose
gel electrophoresis followed gel extraction of specific bands using Gel Extraction/PCR
Clean-up kit (Qiagen) following manufacturer instructions. Purified amplicons were
sequenced using Sanger sequencing, and traces reviewed manually for the expected presence
or absence of the mutated base.
Droplet digital PCR
For the validation and quantification of the frequency of the PIK3CA SNV detected
in MB3, we used droplet digital PCR (ddPCR), since Sanger traces were of poor quality
in the region of interest. Genomic DNA from 6 spatially distinct biopsies from MB3
as well as matched germline and a healthy donor control were used in the assay. The
PIK3CA mutation (chr 3:178936091 G>A) was validated by using the PrimePCR ddPCR mutation
assay kit, PIK3CA p.E545K, human (Biorad, dHsaCP2000075 (mutant, FAM) and dHsaCP2000076
(Wt, VIC)), accordingly to manufacturer instructions. Fluorescence measurement using
QX100 ddPCR droplet reader (Biorad) was used to detect the presence of mutant and
wildtype alleles. QuantaSoft Analysis software (Biorad) was used in the quantification.
Copy number analysis
TITAN
21
estimates the cellular prevalence of tumor cell populations (lineages) based on a
user-defined number of clonal clusters, and user-defined ploidy estimation. Thus,
20 runs of TITAN were performed for each exome, with cluster numbers 1 to 10 (representing
one clonal lineage through to 10 co-existing clonal lineages with distinct genotypes),
and ploidy set to either 2 or 4. Copy number segments from the 20 parameter combinations
were analyzed and merged into larger segments if they were on the same chromosome
arm, were <10Mb apart, and had the same state (loss or gain). Merged results from
each of the 20 parameter combinations for each biopsy were compared in order to select
the optimal parameter combination as the highest scoring considering the following
criteria:
maximize the largest contig size
maximize the median contig size
minimize the number of contigs
minimize the number of clonal clusters
The parameter combination with the largest x value was selected as optimal, where:
x
=
(
(
L
∗
M
)
∗
(
M
2
/
10
9
)
∗
(
1
/
T
)
∗
(
1
/
(
C
+
1
)
2
)
)
/
M
/
10
9
L = largest contig size (Gb)
M = median contig size (Gb)
T = total number of contigs
C = clonal clusters
We next assessed the prevalence of copy number segments (loss or gain) identified
in the best parameter combination of a unique biopsy (i.e. target segments), using
either all segments or clonal segments only (logratio ≥ |0.2|). A target segment was
considered as found in another biopsy from the same tumor if any of the 20 parameter
combinations contained a segment with the same state (loss or gain), and whose span
had a minimum reciprocal overlap of at least 70% with the target segment.
Concordance of driver regions of loss and gain in the RCC tumor cohort was performed
for our calls and the published data
9
. With our computational approach, we achieved a 97% concordance when comparing to
the manual curation performed previously
9
, indicating that this method is specific and sensitive despite a high level of normal
cell contamination in these tumors. Conversely, compared to our results for the subset
of copy number gains and losses identified in Gerlinger et al., 2014, the manual curation
shows an 89% concordance to the TITAN pipeline, indicating that our approach is more
sensitive, and that the homogeneity of certain copy number driver events may be greater
than previously estimated (Supplementary Table 1c). Finally, our approach is applicable
genome-wide and across tumor types in a highly parallel fashion.
SNV Classification using mclust
Variant allele frequencies (VAF) of somatic SNVs were classified into distinct clusters
using the R package mclust
50
, which uses finite mixture estimation via iterative Expectation Maximization steps
(EM) and the Bayesian Information Criterion (BIC). Each cluster is manually categorized
as “homozygous”, “clonal”, or “subclonal”, depending on the cluster VAF and the uncertainty
separating it from the next cluster, and taking into account the biopsy tumor cell
content value reported by TITAN. Multiple subclonal populations are numbered sequentially,
starting with the most highly prevalent population. Clonal and subclonal mutation
details per biopsy are summarized in Supplementary Table 1d,g.
Phylogenetic reconstruction from combined SNV and CNV data
We combined copy number and LOH information derived from TITAN (including the clonal
and subclonal events identified in the best parameter combination run for each biopsy),
as well as somatic mutations and SNPs in areas of LOH, to infer tumor phylogenies
using EXPANDS
22
. EXPANDS v1.7.2 was run with the runExPANdS function. All parameters were set to
default with the exception of maxScore, which was lowered to 1.5 in order to reduce
the false positive rate of subpopulation detection. Only subpopulations with a minimum
size (cellular frequency) of 0.1 were considered. Mutations that could not be assigned
to a high confidence subpopulation were discarded, so that no ambiguous assignments
were made. In addition, ambiguous subpopulations (i.e. groups of mutations and copy
number events) were dropped from the analysis. Mutations are assigned to all nested
subpopulations (i.e. if a mutation is found in a subpopulation of cells at a high
frequency of 0.8, it will also be assigned to “daughter” subpopulations, for instance
of frequency 0.5), to report the assignment of every mutation to all detected subpopulations
in all biopsies of the tumor (assuming that the mutation could be assigned unambiguously
as mentioned above; Supplementary Table 1f).
Phylogenetic relationships between the subpopulations inferred by the EXPANDS algorithm
in all biopsies per patient were generated using both SNV and copy number segments.
The Manhattan distance metric was used to calculate pairwise distances between all
pairs of biopsies based on this data, and a complete linkage hierarchical clustering
was performed to generate phylogenies. Germline-rooted trees were generated using
the as.phylo R function from the ape package.
Error inference of actionable genetic alterations
In order to perform an analysis of genetic heterogeneity affecting actionable and
putative driver genes in a way that is unbiased to either of the tumor types, we opted
to use general lists of known cancer drivers and drugable genes. Sets of genes known
to be drivers in GBM, MB, and RCC tumors come from studies of different cohort sizes,
with sometimes unknown subgroup affiliation, and thus are not equally comprehensive.
To overcome this, we used a list of genes of interest that includes putative driver
genes found in the Cancer Gene Census database
29
(n=572) and actionable genes from the Drug-Gene Interaction Database
30
(n=426 genes) (Supplementary Table 1l–m).
Oncoprint plots (R package ComplexHeatmap v1.6.0) were built for the combination list
of these genes for all tumors, using (a) clonal mutations and indels and (b) clonal
mutations and indels plus high-level copy number aberrations (>4 copies gained; homozygous
loss). Manual review of results revealed that absence of clonal somatic mutations
in subsets of biopsies is not explained by concordant copy number loss. Since not
all biopsies had copy number data, further analyses were performed using results from
strategy (a) in order to maximize the number of usable biopsies per tumor.
Driver event lists
The MB CNA driver events in Supplementary Table 1i,j and Figure 4b are mainly taken
from Shih et al.
27
plus a subset of the mostly highly recurrent genes listed in Northcott et al.
25
. The HGG chromosome arm and recurrent driver genes events were retrieved from Sturm
et al.
26
(Table 1 and 2 of Reference 26). RCC chromosome arm and gene-level driver events were
retrieved from the ccRCC TCGA paper
28
(Sup Fig 22 threshold FDR q-value< 10−15 and Table S4, q-value threshold 0.05). Cancer
cell fraction values presented in Supplementary Figure 10b for driver mutations were
calculated as previously described
51
:
CCF
=
VAF
∗
(
1
/
purity
)
(
CN
∗
purity
+
2
(
1
-purity
)
)
where CCF=cancer cell fraction, VAF=variant allele frequency, CN=copy number at the
mutation, purity=tumor purity as calculated by EXPANDS.
Accuracy of mutation frequency detection
We calculate the inferred error of the prevalence of each mutation across biopsies
using a subsampling approach. In each tumor, given a subset of biopsies from 1 to
n (where n = total biopsies per tumor), the frequency of each identified mutation
in the biopsies sampled is calculated as fo. This value is subtracted from the “ground
truth” expected frequency for that mutation across all n biopsies (fe). When the observed
and expected values are identical, then the inferred error (fe – fo) is 0. In the
majority of tumors, there is a predominance of genes with mutations in single biopsies,
leading to negative values of error for many genes as the frequency of the mutation
is often overestimated (Supplemental Fig. 17). In contrast, genes that are present
in all but one or two biopsies will often have an error value over 0, as their frequency
can be underestimated.
The likelihood of being within +/−0.1 of 0 (i.e. close to perfect accuracy given the
data from all biopsies) is calculated as the proportion of genes at each sampling
of 1:n biopsies where the error rate was within those bounds. For instance, we sample
all possible combinations of a certain number of biopsies from the total number of
biopsies, and in each case calculate the inferred error of each detected mutation’s
prevalence. The proportion of the total set of error values < |0.1| is the likelihood
of a correct interpretation of mutation frequency given that number of biopsies (Fig.
5b).
Genetic heterogeneity estimation from two biopsies
To address the practical issue of estimating genetic heterogeneity from a minimum
number of informative biopsies, we implemented a simple metric of the proportion of
mutated genes in a set of 2 biopsies that was ubiquitous (i.e. present in 2 of 2 biopsies).
The mean value of all pairs of biopsies from a total of n biopsies per tumor showed
a strong divergence in HGG and MB tumors, with high- vs low-variability tumors well
separated (Fig. 5c). These were the same tumors that scored as high vs low variability
based on the accuracy metric described above.
We also observed clear separation of these two classes using the R package mclust
(Supplementary Fig. 18a), which models univariate mixtures of Gaussian distributions
(i.e. corresponding to a mixture of high and low genetic variance brain tumors) via
Expectation-Maximization and the Bayesian Information Criterion
50
. Choosing two thresholds from the mclust density peaks (low=0.55, high=0.75), we
calculated the accuracy of classifying high-variance vs low-variance tumors based
on a single pair of biopsies, and observed that high-variance tumors in particular
have a high true positive and low false positive classification rate (Supplementary
Figure 18b). I.e. based on this metric, the vast majority of pairs of biopsies from
tumors with high genetic heterogeneity have a low percentage of gene mutations found
in both biopsies, such that they are always classified as heterogeneous tumors, and
almost never as homogeneous tumors.
Expression analysis of immunotherapeutic targets in MB tumors
Microarray expression data from the Affymetrix Gene 1.1 ST array (Affymetrix, Santa
Clara, CA) for the MB samples were analyzed in the R environment (v3.1.1).
We used the affy package (v1.44.0) and the custom CDF hugene11sthsensgcdf (v19.0.0)
to summarize the expression of 21641 Ensembl (ENSG) genes and process the data. Expression
data was normalized using the rma method.
Spatial genetic variance vs post-treatment clonal evolution
To directly measure the relative contribution of spatial heterogeneity versus clonal
evolution induced by treatment, we utilized our previously published cohort of matched
pre- and post-therapeutic MB samples
31
. This comparison shows that in MB, the amount of divergence observed between primary
and relapse compartments far exceeds spatial genetic variance in the primary tumor.
To assess whether the observed divergence between primary and recurrent MB is greater
than the observed divergence between intra-tumoral biopsies, we re-analyzed the 14
primary-relapse tumor WGS samples with matched germline using the same pipeline as
presented above. Briefly, mutations were called using samtools mpileup, filtered stringently
against the germline, and shortlisted to those mutations that have at least 10 reads
coverage in both the primary and recurrent samples, and are in areas of normal copy
number and LOH. Since the samples in this work are exomes, we restricted the analysis
of the primary-relapse samples to the same exonic regions. After removing the major
analysis pipeline differences, we addressed differences in depth of coverage. The
exome libraries were sequenced to an average of 60X, while the WGS samples were sequenced
to 30X coverage. Thus, our ability to assess similarity between regions in the exome
libraries is more sensitive to subclonal events present at low levels (and therefore
preferentially detectable by exome seq and not by WGS). We addressed this bias by
restricting the analysis to clonal events in the exomes, as clonal mutations are detectable
both in exomes and genomes. To verify that this was a reasonable assumption, we compared
the VAF of mutations found in the exomes to those found in matched WGS data generated
from the same samples, but sequenced at 30X coverage. Matched WGS samples are available
for biopsy 1 in each MB tumor with multi-regional profiling. In all cases, we found
that >75% of mutations with a VAF < 0.18 in the MB exomes were not found in the matched
genomes sequenced from the same samples, indicating that subclonal events are typically
not well profiled at the shallower depths of a genomic library. Therefore we restricted
our analysis to clonal events in both the exomes and genomes.
Focusing on the clonal and homozygous events detectable in both exome and genome data,
we hypothesized that any differences between primary and relapse samples that are
greater than the differences expected from different biopsies in a primary tumor would
be largely attributable to clonal evolution as a consequence of therapy. To see if
the data support this conclusion, we used the mutations in each biopsy to measure
pairwise concordance between all biopsies of individual tumors. Concordance is measured
as the number of mutations in common between two biopsies, as a fraction of the total
number of mutations present in both. In parallel, we used the mutations in the primary
and relapse samples to measure pairwise concordance values between disease compartments.
As a positive control, we compared the inter-biopsy and inter-compartmental concordance
values of an adult GBM sample with multiple biopsies profiled before and after therapy
(Patient 17 from reference
32
).
In MB samples we found a mean pairwise concordance of 0.3903 between biopsies of the
same tumor, nearly an order of magnitude higher than the mean concordance (0.03852)
observed between disease compartments (Wilcoxon Rank Sum test p-value < 2.2e–16).
One sample stood out as an outlier (MB-REC-04), and we note that in this case the
tumor was a Group4 local recurrence. This unusual pattern of recurrence for a Group4
tumor may indicate that the primary mass was sub-totally resected rather than grossly
resected, explaining the higher similarity of the recurrent compartment to the primary.
In the case of the adult GBM patient (Patient17) with multi-regionally sampled primary
(3 regions; low-grade glioma) and recurrent disease (4 regions; high-grade glioma),
we found the same trend: the primary-relapse mean concordance of 0.01506 was an order
of magnitude smaller than the mean intra-biopsy concordance of 0.5036 (Wilcoxon Rank
Sum test p-value = 0.0001406). There was no significant difference between the primary-relapse
MB concordance and the primary-relapse GBM concordance observed in Patient17 (Wilcoxon
Rank Sum test p-value = 0.5458). Similarly, there was no significant difference between
the regional biopsies in GBM vs MB (Wilcoxon Rank Sum test p-value = 0.09926).
Finally, the primary-relapse divergence calculated using reprocessed data from Patient17
was on par with that initially presented in the glioma paper
41
, thus we included, for visual comparison, all the primary-relapse values for the
glioma cohort in Figure 6a (middle panel; Johnson et al, 2014; values directly derived
from Supplementary Table 4
32
).
Statistical analysis
All statistical analyses were performed in the R statistical environment. Comparisons
of categorical variables between entity types were performed using the two-sided Fisher’s
exact test. Comparisons of distributions were performed using the Welch two-sample
t-test (parametric) or the Wilcoxon Rank Sum test (non-parametric). P-values <0.05
were considered statistically significant.
Supplementary Material
1
2
3