A graph kernel approach for alignment-free domain–peptide interaction prediction with an application to human SH3 domains

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation: State-of-the-art experimental data for determining binding specificities of peptide recognition modules (PRMs) is obtained by high-throughput approaches like peptide arrays. Most prediction tools applicable to this kind of data are based on an initial multiple alignment of the peptide ligands. Building an initial alignment can be error-prone, especially in the case of the proline-rich peptides bound by the SH3 domains.

Results: Here, we present a machine-learning approach based on an efficient graph-kernel technique to predict the specificity of a large set of 70 human SH3 domains, which are an important class of PRMs. The graph-kernel strategy allows us to (i) integrate several types of physico-chemical information for each amino acid, (ii) consider high-order correlations between these features and (iii) eliminate the need for an initial peptide alignment. We build specialized models for each human SH3 domain and achieve competitive predictive performance of 0.73 area under precision-recall curve, compared with 0.27 area under precision-recall curve for state-of-the-art methods based on position weight matrices.

We show that better models can be obtained when we use information on the noninteracting peptides (negative examples), which is currently not used by the state-of-the art approaches based on position weight matrices. To this end, we analyze two strategies to identify subsets of high confidence negative data.

The techniques introduced here are more general and hence can also be used for any other protein domains, which interact with short peptides (i.e. other PRMs).

Availability: The program with the predictive models can be found at http://www.bioinf.uni-freiburg.de/Software/SH3PepInt/SH3PepInt.tar.gz. We also provide a genome-wide prediction for all 70 human SH3 domains, which can be found under http://www.bioinf.uni-freiburg.de/Software/SH3PepInt/Genome-Wide-Predictions.tar.gz.

Contact: backofen@ 123456informatik.uni-freiburg.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 44

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15438 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Combining evidence using p-values: application to sequence homology searches.

Sean T. Bailey, M Gribskov, Swneke D Bailey (1997)

To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

0 comments Cited 335 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

SH3 domains: complexity in moderation.

B Mayer (2001)

The SH3 domain is perhaps the best-characterized member of the growing family of protein-interaction modules. By binding with moderate affinity and selectivity to proline-rich ligands, these domains play critical roles in a wide variety of biological processes ranging from regulation of enzymes by intramolecular interactions, increasing the local concentration or altering the subcellular localization of components of signaling pathways, and mediating the assembly of large multiprotein complexes. SH3 domains and their binding sites have cropped up in many hundreds of proteins in species from yeast to man, which suggests that they provide the cell with an especially handy and adaptable means of bringing proteins together. The wealth of genetic, biochemical and structural information available provides an intimate and detailed portrait of the domain, serving as a framework for understanding other modular protein-interaction domains. Processes regulated by SH3 domains also raise important questions about the nature of specificity and the overall logic governing networks of protein interactions.

0 comments Cited 126 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Journal ID (hwp): bioinfo

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date (Print): 1 July 2013

Publication date (Electronic): 19 June 2013

Publication date PMC-release: 19 June 2013

Volume: 29

Issue: 13

Pages: i335-i343

Affiliations

¹Bioinformatics Group, Department of Computer Science, Georges-Köhler-Allee 106, 79110 Freiburg, ²Centre for Biological Signalling Studies (BIOSS), 79104 Freiburg, ³Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg im Breisgau, 79104 Freiburg, Germany and ⁴Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark

Author notes

*To whom correspondence should be addressed.

^†The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

Article

Publisher ID: btt220

DOI: 10.1093/bioinformatics/btt220

PMC ID: 3694653

PubMed ID: 23813002

SO-VID: add45187-c14c-4157-9274-810e6b64c102

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

A graph kernel approach for alignment-free domain–peptide interaction prediction with an application to human SH3 domains

Read this article at

Abstract

Related collections

Genetoberfest

Most cited references 44

Gene Ontology: tool for the unification of biology

Combining evidence using p-values: application to sequence homology searches.

SH3 domains: complexity in moderation.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 252

Cited by 9

Most referenced authors 1,639