Benchmarking mutation effect prediction algorithms using functionally validated cancer-related missense mutations

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Massively parallel sequencing studies have led to the identification of a large number of mutations present in a minority of cancers of a given site. Hence, methods to identify the likely pathogenic mutations that are worth exploring experimentally and clinically are required. We sought to compare the performance of 15 mutation effect prediction algorithms and their agreement. As a hypothesis-generating aim, we sought to define whether combinations of prediction algorithms would improve the functional effect predictions of specific mutations.

Results

Literature and database mining of single nucleotide variants (SNVs) affecting 15 cancer genes was performed to identify mutations supported by functional evidence or hereditary disease association to be classified either as non-neutral (n = 849) or neutral (n = 140) with respect to their impact on protein function. These SNVs were employed to test the performance of 15 mutation effect prediction algorithms. The accuracy of the prediction algorithms varies considerably. Although all algorithms perform consistently well in terms of positive predictive value, their negative predictive value varies substantially. Cancer-specific mutation effect predictors display no-to-almost perfect agreement in their predictions of these SNVs, whereas the non-cancer-specific predictors showed no-to-moderate agreement. Combinations of predictors modestly improve accuracy and significantly improve negative predictive values.

Conclusions

The information provided by mutation effect predictors is not equivalent. No algorithm is able to predict sufficiently accurately SNVs that should be taken forward for experimental or clinical testing. Combining algorithms aggregates orthogonal information and may result in improvements in the negative predictive value of mutation effect predictions.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0484-1) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 14

Record: found
Abstract: found
Article: not found

Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database.

Audrey Petitjean, Ewy Mathe, Shunsuke Kato … (2007)

The tumor suppressor gene TP53 is frequently mutated in human cancers. More than 75% of all mutations are missense substitutions that have been extensively analyzed in various yeast and human cell assays. The International Agency for Research on Cancer (IARC) TP53 database (www-p53.iarc.fr) compiles all genetic variations that have been reported in TP53. Here, we present recent database developments that include new annotations on the functional properties of mutant proteins, and we perform a systematic analysis of the database to determine the functional properties that contribute to the occurrence of mutational "hotspots" in different cancer types and to the phenotype of tumors. This analysis showed that loss of transactivation capacity is a key factor for the selection of missense mutations, and that difference in mutation frequencies is closely related to nucleotide substitution rates along TP53 coding sequence. An interesting new finding is that in patients with an inherited missense mutation, the age at onset of tumors was related to the functional severity of the mutation, mutations with total loss of transactivation activity being associated with earlier cancer onset compared to mutations that retain partial transactivation capacity. Furthermore, 80% of the most common mutants show a capacity to exert dominant-negative effect (DNE) over wild-type p53, compared to only 45% of the less frequent mutants studied, suggesting that DNE may play a role in shaping mutation patterns. These results provide new insights into the factors that shape mutation patterns and influence mutation phenotype, which may have clinical interest.

0 comments Cited 503 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations.

Hannah Carter, Sining Chen, Leyla Isik … (2009)

Large-scale sequencing of cancer genomes has uncovered thousands of DNA alterations, but the functional relevance of the majority of these mutations to tumorigenesis is unknown. We have developed a computational method, called Cancer-specific High-throughput Annotation of Somatic Mutations (CHASM), to identify and prioritize those missense mutations most likely to generate functional changes that enhance tumor cell proliferation. The method has high sensitivity and specificity when discriminating between known driver missense mutations and randomly generated missense mutations (area under receiver operating characteristic curve, >0.91; area under Precision-Recall curve, >0.79). CHASM substantially outperformed previously described missense mutation function prediction methods at discriminating known oncogenic mutations in P53 and the tyrosine kinase epidermal growth factor receptor. We applied the method to 607 missense mutations found in a recent glioblastoma multiforme sequencing study. Based on a model that assumed the glioblastoma multiforme mutations are a mixture of drivers and passengers, we estimate that 8% of these mutations are drivers, causally contributing to tumorigenesis.

0 comments Cited 202 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Identifying Mendelian disease genes with the Variant Effect Scoring Tool

Hannah Carter, Christopher Douville, Peter Stenson … (2013)

Background Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. Results We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency >1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome. Conclusions Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies. VEST is available as a stand-alone software package at http://wiki.chasmsoftware.org and is hosted by the CRAVAT web server at http://www.cravat.us

0 comments Cited 201 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Luciano G Martelotto: martelol@mskcc.org

Charlotte KY Ng: ngk1@mskcc.org

Maria R De Filippo: defilipm@mskcc.org

Yan Zhang: yaz2011@med.cornell.edu

Salvatore Piscuoglio: piscuogs@mskcc.org

Raymond S Lim: limr@mskcc.org

Ronglai Shen: shenr@mskcc.org

Larry Norton: nortonl@mskcc.org

Jorge S Reis-Filho: reisfilj@mskcc.org

Britta Weigelt: weigeltb@mskcc.org

Journal

Journal ID (nlm-ta): Genome Biol

Title: Genome Biology

Publisher: BioMed Central (London )

ISSN (Print): 1465-6906

ISSN (Electronic): 1465-6914

Publication date (Electronic): 28 October 2014

Publication date PMC-release: 28 October 2014

Publication date (Print): 2014

Volume: 15

Issue: 10

Electronic Location Identifier: 484

Affiliations

[ ]Department of Pathology, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065 USA

[ ]Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065 USA

[ ]Department of Medicine, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065 USA

Article

Publisher ID: 484

DOI: 10.1186/s13059-014-0484-1

PMC ID: 4232638

PubMed ID: 25348012

SO-VID: 313766d7-3714-4bea-8405-3cbc756f0b47

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 13 June 2014

Date accepted : 30 September 2014

Custom metadata

ScienceOpen disciplines: Genetics

Data availability:

ScienceOpen disciplines: Genetics

Comments

Comment on this article

scite_

Cited by 61

See all cited by

Benchmarking mutation effect prediction algorithms using functionally validated cancer-related missense mutations

Read this article at

Abstract

Background

Results

Conclusions

Electronic supplementary material

Related collections

Genomic Prediction

Most cited references 14

Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database.

Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations.

Identifying Mendelian disease genes with the Variant Effect Scoring Tool

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 65

Cited by 61