Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

High tumor mutational burden (TMB) is an emerging biomarker of sensitivity to immune checkpoint inhibitors and has been shown to be more significantly associated with response to PD-1 and PD-L1 blockade immunotherapy than PD-1 or PD-L1 expression, as measured by immunohistochemistry (IHC). The distribution of TMB and the subset of patients with high TMB has not been well characterized in the majority of cancer types.

Methods

In this study, we compare TMB measured by a targeted comprehensive genomic profiling (CGP) assay to TMB measured by exome sequencing and simulate the expected variance in TMB when sequencing less than the whole exome. We then describe the distribution of TMB across a diverse cohort of 100,000 cancer cases and test for association between somatic alterations and TMB in over 100 tumor types.

Results

We demonstrate that measurements of TMB from comprehensive genomic profiling are strongly reflective of measurements from whole exome sequencing and model that below 0.5 Mb the variance in measurement increases significantly. We find that a subset of patients exhibits high TMB across almost all types of cancer, including many rare tumor types, and characterize the relationship between high TMB and microsatellite instability status. We find that TMB increases significantly with age, showing a 2.4-fold difference between age 10 and age 90 years. Finally, we investigate the molecular basis of TMB and identify genes and mutations associated with TMB level. We identify a cluster of somatic mutations in the promoter of the gene PMS2, which occur in 10% of skin cancers and are highly associated with increased TMB.

Conclusions

These results show that a CGP assay targeting ~1.1 Mb of coding genome can accurately assess TMB compared with sequencing the whole exome. Using this method, we find that many disease types have a substantial portion of patients with high TMB who might benefit from immunotherapy. Finally, we identify novel, recurrent promoter mutations in PMS2, which may be another example of regulatory mutations contributing to tumorigenesis.

Electronic supplementary material

The online version of this article (doi:10.1186/s13073-017-0424-2) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 42

Record: found
Abstract: found
Article: not found

The blockade of immune checkpoints in cancer immunotherapy.

Drew M Pardoll (2012)

Among the most promising approaches to activating therapeutic antitumour immunity is the blockade of immune checkpoints. Immune checkpoints refer to a plethora of inhibitory pathways hardwired into the immune system that are crucial for maintaining self-tolerance and modulating the duration and amplitude of physiological immune responses in peripheral tissues in order to minimize collateral tissue damage. It is now clear that tumours co-opt certain immune-checkpoint pathways as a major mechanism of immune resistance, particularly against T cells that are specific for tumour antigens. Because many of the immune checkpoints are initiated by ligand-receptor interactions, they can be readily blocked by antibodies or modulated by recombinant forms of ligands or receptors. Cytotoxic T-lymphocyte-associated antigen 4 (CTLA4) antibodies were the first of this class of immunotherapeutics to achieve US Food and Drug Administration (FDA) approval. Preliminary clinical findings with blockers of additional immune-checkpoint proteins, such as programmed cell death protein 1 (PD1), indicate broad and diverse opportunities to enhance antitumour immunity with the potential to produce durable clinical responses.

0 comments Cited 5126 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database.

Audrey Petitjean, Ewy Mathe, Shunsuke Kato … (2007)

The tumor suppressor gene TP53 is frequently mutated in human cancers. More than 75% of all mutations are missense substitutions that have been extensively analyzed in various yeast and human cell assays. The International Agency for Research on Cancer (IARC) TP53 database (www-p53.iarc.fr) compiles all genetic variations that have been reported in TP53. Here, we present recent database developments that include new annotations on the functional properties of mutant proteins, and we perform a systematic analysis of the database to determine the functional properties that contribute to the occurrence of mutational "hotspots" in different cancer types and to the phenotype of tumors. This analysis showed that loss of transactivation capacity is a key factor for the selection of missense mutations, and that difference in mutation frequencies is closely related to nucleotide substitution rates along TP53 coding sequence. An interesting new finding is that in patients with an inherited missense mutation, the age at onset of tumors was related to the functional severity of the mutation, mutations with total loss of transactivation activity being associated with earlier cancer onset compared to mutations that retain partial transactivation capacity. Furthermore, 80% of the most common mutants show a capacity to exert dominant-negative effect (DNE) over wild-type p53, compared to only 45% of the less frequent mutants studied, suggesting that DNE may play a role in shaping mutation patterns. These results provide new insights into the factors that shape mutation patterns and influence mutation phenotype, which may have clinical interest.

0 comments Cited 503 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website

S. Bamford, E G Dawson, Gregory S. Forbes … (2004)

Approximately one in three individuals in Europe and North America develops one of the approximately 200 different classes of cancer and it is the cause of death of one in five (Higginson, 1992). All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, each of which ultimately confers growth advantage upon the clone of cells in which it has occurred (Vogelstein and Kinzler, 1998). These abnormalities include base substitutions, deletions, amplifications and rearrangements. The extent to which each of these mechanisms contributes to cancer varies markedly between different genes, and probably also between different cancer types. Identification of the genes that are mutated in cancer is a central aim of cancer research. Over the past 25 years, approximately 300 genes have been shown to be somatically mutated in cancer (Futreal et al, 2004). This work forms the foundation for understanding the biological abnormalities within neoplastic cells, provides information on the function of gene products and sheds light on more complex questions such as the relationships between genes and biochemical pathways. Current strategies for the development of new therapeutic and preventive agents in cancer are increasingly dependent upon modulation of these critical molecular targets. The scientific literature is a rich source of mutation data that, in general, is published in a piecemeal fashion. More comprehensive data sources do exist, such as Online Mendelian Inheritance in Man (OMIM, Wheeler et al, 2004), HGVbase (Fredman et al, 2002) and the Human Gene Mutation Database (HGMD, Stenson et al, 2003). These databases give overviews of the genetics and biology of many genes and associated diseases (OMIM), genome variants and associated genotype–phenotype relationships (HGVbase) or germline mutation data (HGMD). For somatic mutations in cancer, there are many locus-specific web resources, such as those for p53 (Olivier et al, 2002; Béroud and Soussi, 2003), that cover a single gene in depth. The value of these various databases should not be underestimated; however, none of them offer a comprehensive view of all previously reported somatic mutations in cancer. Looking to the future, the volume of somatic mutation data will continue to expand and the scientific community will be better served if this data is provided in a coherent fashion. A public, comprehensive, intuitive, accessible and integrated database is required to maximise the benefit from this rich data set. The Catalogue of Somatic Mutations in Cancer (COSMIC), (http://www.sanger.ac.uk/cosmic) is a database that holds somatic mutation data and associated information, and can be interrogated through a series of web pages to provide a graphical or tabular view of the data along with various export options. To date, the database has been populated with data from four genes: HRAS, KRAS2, NRAS and BRAF. DATA CURATION Gene selection The genes that have been selected for curation are taken from the list of cancer genes assembled in the Cancer Gene Census (Futreal et al, 2004). In the first instance, data was obtained for four genes that are known to be somatically mutated in cancer: HRAS (Reddy et al, 1982), KRAS2 (McCoy et al, 1983), NRAS (Hall et al, 1983) and BRAF (Davies et al, 2002). Data extraction from the literature PubMed (Wheeler et al, 2004) is broadly searched for references containing relevant somatic mutation data in cancer (example search: (ras OR genes, ras) AND human AND mutation). In the first instance, the abstract is read to identify, and select for inclusion in the database, papers that are likely to include somatic mutation information relating to cancer or precancerous conditions. Primary research papers are read and information about the samples, mutations and experimental methods (see Table 1 Table 1 Data entered in COSMIC Reference Sample Title Gene Authors Experimental information Journal Sample ID Year Mutation status Volume Normal tissue tested Page start and stop Site primary PubMed ID Site subtype 1 Experimental information Site subtype 2 Gene Histology Histology subtype 1 Mutation Histology subtype 2 Mutation ID Stage Mutation type Grade DNA location Source tissue DNA change Loss of heterozygosity DNA evidence Gender Is somatic Age RNA label Other mutations RNA change Ethnicity RNA region Geographical location RNA location Parent tested RNA evidence Family ID Amino-acid label Remark Amino-acid location Reference Amino-acid change Environmental variables Amino-acid evidence Gene Gene Sequence Name Remark Symbol Other names Experimental information Chromosome Primary detection method Chromosome band Secondary detection method cDNA sequence accession Confirmation method cDNA sequence version Exons/codons screened Ensembl gene start and stop Whole gene screened Swissprot accession Remark OMIM accession Section heading for the data in COSMIC are in bold. ) is extracted and entered into the database. Reviews are also selected if thought to be specific to a gene of interest. In order to avoid duplication of data, this source is used to identify the relevant primary literature and not as the source of the mutation data. Any references containing incomplete data (e.g. mutations reported but not fully described) or data of insufficient quality (e.g. errors identified in the data) are not fully curated but are added to a list of additional references containing somatic mutation information. Simple mutations are fed through Mutation Checker (Stajich et al, 2002) before being imported to COSMIC, while more complex alterations are manually annotated. COSMIC DATABASE The COSMIC database is implemented in an Oracle relational database and has five sections each containing multiple tables. Gene information A static version of each gene is maintained in COSMIC. The genomic structure of each gene and chromosome location is derived from Ensembl (Birney et al, 2004) and cDNA sequence and protein sequence from the RefSeq project (Wheeler et al, 2004). Other information is held to provide links to web resources such as Ensembl (Birney et al, 2004), Pfam (Bateman et al, 2004), InterPro (Mulder et al, 2003) and OMIM (Wheeler et al, 2004). Paper information The details of the papers that have been curated are maintained in the paper section and include title, journal, author lists and links to PubMed. There are currently 1483 papers in COSMIC, 865 of these have been curated for mutations, while 618 either have no relevant data or incomplete data that could not accurately be extracted. By gene 30, 249, 718 and 303 papers report BRAF, HRAS, KRAS2 and NRAS mutations, respectively. Of the 865 papers reporting mutations, 615 report data on only one gene, while 72, 174 and four contain data on two, three or all four genes, respectively. Mutation information COSMIC can accommodate information on base substitutions, insertions and deletions, translocations and changes in copy number. For the four genes presently in COSMIC, there are 147 unique mutations (36 for BRAF, 27 for HRAS, 52 for KRAS2 and 32 for NRAS). In the tumours that have been analysed, there are a total of 10 647 mutations, 736 in BRAF, 477 in HRAS, 8302 in KRAS2 and 1132 in NRAS. Tumour classification system The tissue site and histology data is taken from the curated papers and entered into COSMIC (this forms the ‘paper definition’). Tumour classification is a continually evolving field and there is no standard nomenclature adhered to for the purposes of publication in the various journals. Identical tissues and histologies can have different labels depending on the origin and age of the study. To overcome difficulties caused by these alternate nomenclatures, a standardised system of definitions has been developed (the ‘COSMIC definitions’) through consultation with experts in the field. This groups data from the same tissue types and histologies and can be used to translate the ‘paper definitions’ to ‘COSMIC definitions’. Every sample has up to eight definitions; primary tissue, tissue subtype 1, 2 and 3, primary histology and histology subtypes 1, 2 and 3. If there is no data for any of these definitions, COSMIC records an entry of NS, not specified. A total of 513 tissue definitions have been noted in the papers in COSMIC and have been translated to 372 COSMIC tissue definitions. Likewise, a total of 1150 histology definitions were found in the papers in COSMIC that were translated to 425 COSMIC histology definitions. This unified classification system is presented through the web pages to present a normalised browsing tool. Individual/tumour/sample data The sample data is taken from the curated papers and linked to the appropriate gene, paper, classification and when present a mutation. This forms the core of the COSMIC database. An individual can have many tumours and each tumour can have many samples. However in the COSMIC scheme, each sample is unique and could be considered as a single experiment. There are 66 634 sample records in COSMIC (5158, 11 876, 35 716 and 13 884 for BRAF, HRAS, KRAS2 and NRAS, respectively). These samples are derived from 57 444 tumours of which 51 988 were analysed in one gene, 2353 in two genes, 2930 in three genes and 173 in all four genes. COSMIC WEBSITE A series of web pages provides query tools to interrogate COSMIC and produces graphical (Figure 1 Figure 1 The initial output from COSMIC is a graphical view of the mutations distributed along the linear amino-acid sequence of the gene. The scale bar incorporates a zoom function to generate a more detailed view of the protein to the point where individual amino acids are named (when there are fewer than 31 amino acids displayed). When a Pfam or Interpro domain is present, a link is provided to these resources (adjacent to the Domain label) while links to the papers that were curated are positioned beneath the mutations (in red) with an option of either viewing the papers that have data for a particular location in the protein or all of the papers for the selected gene. ) and tabular (Table 2 Table 2 Mutation Details from COSMIC Details for BRAF Tissue Mutations (% of All Samples) All Samples Mutation Data NS 0 3 More Details adrenal gland 0 2 More Details autonomic ganglia 0 27 More Details bile duct 16 (23%) 70 More Details bladder 0 37 More Details bone 1 (3%) 31 More Details brain 4 (7%) 56 More Details breast 1 (1%) 78 More Details cervix 0 49 More Details endometrium 0 5 More Details eye 0 31 More Details haematopoietic and lymphoid tissue 4 (1%) 322 More Details head neck 6 (4%) 152 More Details kidney 0 12 More Details large intestine 148 (13%) 1135 More Details larynx 0 25 More Details liver 1 (3%) 32 More Details lung 15 (2%) 829 More Details mouth 0 13 More Details ovary 57 (20%) 282 More Details pancreas 5 (4%) 114 More Details pharynx 3 (6%) 51 More Details placenta 0 1 More Details pleura 0 3 More Details prostate 0 43 More Details skin 282 (61%) 460 More Details small intestine 0 1 More Details soft tissue 5 (2%) 211 More Details stomach 7 (2%) 407 More Details testis 0 7 More Details thyroid 181 (27%) 669 More Details The mutations from COSMIC are presented by tissue and where selected by histology with a figure for the number of samples analysed for each tissue (All Samples) and the number of mutations reported (Mutated). The ‘More Details’ column gives further navigation options to view data for the selected tissue, view data for the same tissue in other genes or provide more details on the mutations for the selected tissue. ) displays of the data. Currently the output is provided at the amino-acid level based on the protein structure of each gene. Browse by gene Immediate access to the data is provided through the Browse by Gene link. This gives an instant overview of the mutation data for one or more genes and gives links to display data for individual tissues. Browse by tissue More complex queries can be constructed using the Browse by Tissue link. The user has the option to select one or more tissues, then one or more histologies, and finally one or more genes. If only one tissue or histology is selected, it is possible to select one or more tissue or histology subtypes before making a gene selection. All of the tissues present in the COSMIC classification scheme are available from the first page; however, subsequent pages only show the relevant options and not the entire list of options, for example having selected eye, the tissue subtype options are retina and uveal tract. Data display After querying the database, the results are displayed as a figure (Figure 1) and as a series of tables (Table 2) for each gene that was selected. The figure shows the linear amino-acid sequence derived from the gene with the mutations positioned along its length. Further information and links are provided as appropriate to the protein sequence. The table gives a summary of the mutations stratified by tissue and histology. The depth of the stratification relates to the depth of the original query. If only tissue was selected, the data will be stratified by tissue; however, if tissue, subtissue, histology and subhistology are selected, the data will be broken down further. Links from this table reload the figure to display a subset of the data and provide more details of the specific mutations. Two other tables provide a summary of the statistics in COSMIC for the selected gene and a summary of the mutations shown in the figure. Exports and downloads Having displayed the results from a query, the data can be formatted in simple text, Excel or HTML that can be downloaded from the COSMIC site. The cDNA and protein sequences are available through the Additional Info. link on the COSMIC home page as is the Classification Scheme. FUTURE DIRECTIONS There is a continuing effort to enter additional somatic mutation data in to COSMIC. In order to keep the data in COSMIC up-to-date, we regularly monitor the literature for new reports of mutations in the genes that exist in COSMIC. In addition, further cancer genes will be taken from the Cancer Gene Census (Futreal et al, 2004) and curated. The COSMIC website will be developed further to make use of the underlying data. This will include a DNA view of the mutations and methods to display insertions and deletions. In addition, we will display other data that has already been captured such as the patient sex and age for the samples and the experimental methods used to screen for the mutations. There are however limitations to this data as we can only collect data that is described in the original work. Even with this caveat the data provides a direct summary of the somatic mutation literature. Considering the data set as a whole it will be possible to analyse, in greater detail, the wider aspects of the biology underlying the genetic changes that take place in cancer.

0 comments Cited 431 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Garrett M. Frampton: (617) 418-2234 , gframpton@foundationmedicine.com

Journal

Journal ID (nlm-ta): Genome Med

Journal ID (iso-abbrev): Genome Med

Title: Genome Medicine

Publisher: BioMed Central (London )

ISSN (Electronic): 1756-994X

Publication date (Electronic): 19 April 2017

Publication date PMC-release: 19 April 2017

Publication date Collection: 2017

Volume: 9

Electronic Location Identifier: 34

Affiliations

[1 ]Foundation Medicine Inc., 150 Second St., Cambridge, MA 02141 USA

[2 ]Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts USA

[3 ]GRID grid.66859.34, , Broad Institute of MIT and Harvard, ; Cambridge, Massachusetts USA

[4 ]GRID grid.42327.30, , The Hospital for Sick Children, ; Toronto, Ontario Canada

Article

Publisher ID: 424

DOI: 10.1186/s13073-017-0424-2

PMC ID: 5395719

PubMed ID: 28420421

SO-VID: 09b34dd2-cd6d-4a6e-8189-520406d02664

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 8 September 2016

Date accepted : 18 March 2017

Funding

Funded by: FundRef http://dx.doi.org/10.13039/100000892, Prostate Cancer Foundation;

Award ID: Young Investigator Award

Funded by: FundRef http://dx.doi.org/10.13039/100000005, U.S. Department of Defense;

Award ID: Prostate Cancer Research Program

Custom metadata

ScienceOpen disciplines: Molecular medicine

Keywords: tumor mutational burden,cancer genomics,mismatch repair,pms2

Data availability:

ScienceOpen disciplines: Molecular medicine

Keywords: tumor mutational burden, cancer genomics, mismatch repair, pms2

Comments

Comment on this article

scite_

Cited by 1,426

See all cited by

Most referenced authors 3,049

See all reference authors

Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden

Read this article at

Abstract

Background

Methods

Results

Conclusions

Electronic supplementary material

Related collections

Network and Systems Medicine

Most cited references 42

The blockade of immune checkpoints in cancer immunotherapy.

Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database.

The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 146

Cited by 1,426

Most referenced authors 3,049