Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole-genome/whole-exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever-increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species-independent method with optional species-specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state-of-the-art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high-throughput/large-scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat ( Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web-based implementation of FATHMM, including a high-throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.

Related collections

Most cited references 25

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15850 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Human non-synonymous SNPs: server and survey.

V. Ramensky (2002)

Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect of an nsSNP on protein structure and function. The prediction method enabled analysis of the publicly available SNP database HGVbase, which gave rise to a dataset of nsSNPs with predicted functionality. The dataset was further used to compare the effect of various structural and functional characteristics of amino acid substitutions responsible for phenotypic display of nsSNPs. We also studied the dependence of selective pressure on the structural and functional properties of proteins. We found that in our dataset the selection pressure against deleterious SNPs depends on the molecular function of the protein, although it is insensitive to several other protein features considered. The strongest selective pressure was detected for proteins involved in transcription regulation.

0 comments Cited 652 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

UniRef: comprehensive and non-redundant UniProt reference clusters.

Baris Suzek, Hongzhan Huang, Peter McGarvey … (2007)

Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. Supplementary data are available at Bioinformatics online.

0 comments Cited 608 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Hum Mutat

Journal ID (iso-abbrev): Hum. Mutat

Journal ID (publisher-id): humu

Title: Human Mutation

Publisher: Blackwell Publishing Ltd

ISSN (Print): 1059-7794

ISSN (Electronic): 1098-1004

Publication date (Print): January 2013

Publication date (Electronic): 03 October 2012

Volume: 34

Issue: 1

Pages: 57-65

Affiliations

[1 ]Bristol Centre for Systems Biomedicine and MRC CAiTE Centre, School of Social and Community Medicine, University of Bristol Bristol, United Kingdom

[2 ]Department of Computer Science, University of Bristol, The Merchant Venturers Building Bristol, United Kingdom

[3 ]Institute of Medical Genetics, School of Medicine, Cardiff University Cardiff, United Kingdom

[4 ]School of Biological Sciences, University of Bristol, Woodland Road Bristol, United Kingdom

Author notes

*Correspondence to: Tom Gaunt, Bristol Centre for Systems Biomedicine and MRC CAiTE Centre, School of Social and Community Medicine, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK. E-mail: Tom.Gaunt@ 123456bristol.ac.uk

Communicated by Christophe Béroud

[†]

Joint first authorship.

[‡]

Joint last authorship.

Contract grant sponsors: UK Medical Research Council (G1000427 to T.R.G. and I.N.M.D.); UK Biotechnology and Biological Sciences Research Council (BB/G022771 to J.G.); BIOBASE GmbH (to D.N.C. and P.D.S.).

Article

DOI: 10.1002/humu.22225

PMC ID: 3558800

PubMed ID: 23033316

SO-VID: b9a6580c-1978-4bae-9e82-008aa1ffa7a4

License:

Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.

Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 25

Gene Ontology: tool for the unification of biology

Human non-synonymous SNPs: server and survey.

UniRef: comprehensive and non-redundant UniProt reference clusters.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 121

Cited by 533

Most referenced authors 2,968