Predicting the Functional Effect of Amino Acid Substitutions and Indels

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN ( Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.

Related collections

Most cited references 14

Record: found
Abstract: found
Article: not found

Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database.

Audrey Petitjean, Ewy Mathe, Shunsuke Kato … (2007)

The tumor suppressor gene TP53 is frequently mutated in human cancers. More than 75% of all mutations are missense substitutions that have been extensively analyzed in various yeast and human cell assays. The International Agency for Research on Cancer (IARC) TP53 database (www-p53.iarc.fr) compiles all genetic variations that have been reported in TP53. Here, we present recent database developments that include new annotations on the functional properties of mutant proteins, and we perform a systematic analysis of the database to determine the functional properties that contribute to the occurrence of mutational "hotspots" in different cancer types and to the phenotype of tumors. This analysis showed that loss of transactivation capacity is a key factor for the selection of missense mutations, and that difference in mutation frequencies is closely related to nucleotide substitution rates along TP53 coding sequence. An interesting new finding is that in patients with an inherited missense mutation, the age at onset of tumors was related to the functional severity of the mutation, mutations with total loss of transactivation activity being associated with earlier cancer onset compared to mutations that retain partial transactivation capacity. Furthermore, 80% of the most common mutants show a capacity to exert dominant-negative effect (DNE) over wild-type p53, compared to only 45% of the less frequent mutants studied, suggesting that DNE may play a role in shaping mutation patterns. These results provide new insights into the factors that shape mutation patterns and influence mutation phenotype, which may have clinical interest.

0 comments Cited 503 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations

Brian O'Roak, Pelagia Deriziotis, Choli Lee … (2011)

Evidence for the etiology of autism spectrum disorders (ASD) has consistently pointed to a strong genetic component complicated by substantial locus heterogeneity 1,2 . We sequenced the exomes of 20 sporadic cases of ASD and their parents, reasoning that these families would be enriched for de novo mutations of major effect. We identified 21 de novo mutations, of which 11 were protein-altering. Protein-altering mutations were significantly enriched for changes at highly conserved residues. We identified potentially causative de novo events in 4/20 probands, particularly among more severely affected individuals, in FOXP1, GRIN2B, SCN1A, and LAMC3. In the FOXP1 mutation carrier, we also observed a rare inherited CNTNAP2 mutation and provide functional support for a multihit model for disease risk 3 . Our results demonstrate that trio-based exome sequencing is a powerful approach for identifying novel candidate genes for ASD and suggest that de novo mutations may contribute substantially to the genetic risk for ASD.

0 comments Cited 414 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Identification of the cystic fibrosis gene: genetic analysis.

Ivy Tsui, Bartosz D. Markiewicz, J Rommens … (1989)

Approximately 70 percent of the mutations in cystic fibrosis patients correspond to a specific deletion of three base pairs, which results in the loss of a phenylalanine residue at amino acid position 508 of the putative product of the cystic fibrosis gene. Extended haplotype data based on DNA markers closely linked to the putative disease gene locus suggest that the remainder of the cystic fibrosis mutant gene pool consists of multiple, different mutations. A small set of these latter mutant alleles (about 8 percent) may confer residual pancreatic exocrine function in a subgroup of patients who are pancreatic sufficient. The ability to detect mutations in the cystic fibrosis gene at the DNA level has important implications for genetic diagnosis.

0 comments Cited 254 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Alexandre G. de Brevern: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2012

Publication date (Electronic): 8 October 2012

Volume: 7

Issue: 10

Electronic Location Identifier: e46688

Affiliations

[1]The J. Craig Venter Institute, Rockville, Maryland, United States of America

UMR-S665, INSERM, Université Paris Diderot, INTS, France

Author notes

* E-mail: achan@ 123456jcvi.org

Competing Interests: The authors have the following competing interests: The authors have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org. There are no further patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.

Conceived and designed the experiments: APC YC GES. Performed the experiments: YC. Analyzed the data: YC GES APC JRM SM. Wrote the paper: APC YC.

[¤a]

Current address: Department of Bioinformatics, Pathway Genomics Corporation, San Diego, California, United States of America

[¤b]

Current address: Howard Hughes Medical Institute Janelia Farm Research Campus, Ashburn, Virginia, United States of America

Article

Publisher ID: PONE-D-12-10334

DOI: 10.1371/journal.pone.0046688

PMC ID: 3466303

PubMed ID: 23056405

SO-VID: 73ca4c18-c8db-4fd9-984c-fb283bf3e5fb

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 9 April 2012

Date accepted : 6 September 2012

Page count

Pages: 13

Funding

The work described is funded by the National Institutes of Health (grant number 5R01HG004701-03). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Predicting the Functional Effect of Amino Acid Substitutions and Indels

Read this article at

Abstract

Related collections

Functional role of amyloid

Most cited references 14

Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database.

Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations

Identification of the cystic fibrosis gene: genetic analysis.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 113

Cited by 1,124

Most referenced authors 1,620