39
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Benchmarking mutation effect prediction algorithms using functionally validated cancer-related missense mutations

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Massively parallel sequencing studies have led to the identification of a large number of mutations present in a minority of cancers of a given site. Hence, methods to identify the likely pathogenic mutations that are worth exploring experimentally and clinically are required. We sought to compare the performance of 15 mutation effect prediction algorithms and their agreement. As a hypothesis-generating aim, we sought to define whether combinations of prediction algorithms would improve the functional effect predictions of specific mutations.

          Results

          Literature and database mining of single nucleotide variants (SNVs) affecting 15 cancer genes was performed to identify mutations supported by functional evidence or hereditary disease association to be classified either as non-neutral (n = 849) or neutral (n = 140) with respect to their impact on protein function. These SNVs were employed to test the performance of 15 mutation effect prediction algorithms. The accuracy of the prediction algorithms varies considerably. Although all algorithms perform consistently well in terms of positive predictive value, their negative predictive value varies substantially. Cancer-specific mutation effect predictors display no-to-almost perfect agreement in their predictions of these SNVs, whereas the non-cancer-specific predictors showed no-to-moderate agreement. Combinations of predictors modestly improve accuracy and significantly improve negative predictive values.

          Conclusions

          The information provided by mutation effect predictors is not equivalent. No algorithm is able to predict sufficiently accurately SNVs that should be taken forward for experimental or clinical testing. Combining algorithms aggregates orthogonal information and may result in improvements in the negative predictive value of mutation effect predictions.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13059-014-0484-1) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database.

          The tumor suppressor gene TP53 is frequently mutated in human cancers. More than 75% of all mutations are missense substitutions that have been extensively analyzed in various yeast and human cell assays. The International Agency for Research on Cancer (IARC) TP53 database (www-p53.iarc.fr) compiles all genetic variations that have been reported in TP53. Here, we present recent database developments that include new annotations on the functional properties of mutant proteins, and we perform a systematic analysis of the database to determine the functional properties that contribute to the occurrence of mutational "hotspots" in different cancer types and to the phenotype of tumors. This analysis showed that loss of transactivation capacity is a key factor for the selection of missense mutations, and that difference in mutation frequencies is closely related to nucleotide substitution rates along TP53 coding sequence. An interesting new finding is that in patients with an inherited missense mutation, the age at onset of tumors was related to the functional severity of the mutation, mutations with total loss of transactivation activity being associated with earlier cancer onset compared to mutations that retain partial transactivation capacity. Furthermore, 80% of the most common mutants show a capacity to exert dominant-negative effect (DNE) over wild-type p53, compared to only 45% of the less frequent mutants studied, suggesting that DNE may play a role in shaping mutation patterns. These results provide new insights into the factors that shape mutation patterns and influence mutation phenotype, which may have clinical interest.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations.

            Large-scale sequencing of cancer genomes has uncovered thousands of DNA alterations, but the functional relevance of the majority of these mutations to tumorigenesis is unknown. We have developed a computational method, called Cancer-specific High-throughput Annotation of Somatic Mutations (CHASM), to identify and prioritize those missense mutations most likely to generate functional changes that enhance tumor cell proliferation. The method has high sensitivity and specificity when discriminating between known driver missense mutations and randomly generated missense mutations (area under receiver operating characteristic curve, >0.91; area under Precision-Recall curve, >0.79). CHASM substantially outperformed previously described missense mutation function prediction methods at discriminating known oncogenic mutations in P53 and the tyrosine kinase epidermal growth factor receptor. We applied the method to 607 missense mutations found in a recent glioblastoma multiforme sequencing study. Based on a model that assumed the glioblastoma multiforme mutations are a mixture of drivers and passengers, we estimate that 8% of these mutations are drivers, causally contributing to tumorigenesis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Identifying Mendelian disease genes with the Variant Effect Scoring Tool

              Background Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. Results We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency >1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome. Conclusions Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies. VEST is available as a stand-alone software package at http://wiki.chasmsoftware.org and is hosted by the CRAVAT web server at http://www.cravat.us
                Bookmark

                Author and article information

                Contributors
                martelol@mskcc.org
                ngk1@mskcc.org
                defilipm@mskcc.org
                yaz2011@med.cornell.edu
                piscuogs@mskcc.org
                limr@mskcc.org
                shenr@mskcc.org
                nortonl@mskcc.org
                reisfilj@mskcc.org
                weigeltb@mskcc.org
                Journal
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1465-6906
                1465-6914
                28 October 2014
                28 October 2014
                2014
                : 15
                : 10
                : 484
                Affiliations
                [ ]Department of Pathology, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065 USA
                [ ]Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065 USA
                [ ]Department of Medicine, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065 USA
                Article
                484
                10.1186/s13059-014-0484-1
                4232638
                25348012
                313766d7-3714-4bea-8405-3cbc756f0b47
                © Martelotto et al.; licensee BioMed Central Ltd. 2014

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 13 June 2014
                : 30 September 2014
                Categories
                Research
                Custom metadata
                © The Author(s) 2014

                Genetics
                Genetics

                Comments

                Comment on this article