42
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A graph kernel approach for alignment-free domain–peptide interaction prediction with an application to human SH3 domains

      research-article
      1 , 2 , 1 , 1 , 2 , 3 , 4 , *
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: State-of-the-art experimental data for determining binding specificities of peptide recognition modules (PRMs) is obtained by high-throughput approaches like peptide arrays. Most prediction tools applicable to this kind of data are based on an initial multiple alignment of the peptide ligands. Building an initial alignment can be error-prone, especially in the case of the proline-rich peptides bound by the SH3 domains.

          Results: Here, we present a machine-learning approach based on an efficient graph-kernel technique to predict the specificity of a large set of 70 human SH3 domains, which are an important class of PRMs. The graph-kernel strategy allows us to (i) integrate several types of physico-chemical information for each amino acid, (ii) consider high-order correlations between these features and (iii) eliminate the need for an initial peptide alignment. We build specialized models for each human SH3 domain and achieve competitive predictive performance of 0.73 area under precision-recall curve, compared with 0.27 area under precision-recall curve for state-of-the-art methods based on position weight matrices.

          We show that better models can be obtained when we use information on the noninteracting peptides (negative examples), which is currently not used by the state-of-the art approaches based on position weight matrices. To this end, we analyze two strategies to identify subsets of high confidence negative data.

          The techniques introduced here are more general and hence can also be used for any other protein domains, which interact with short peptides (i.e. other PRMs).

          Availability: The program with the predictive models can be found at http://www.bioinf.uni-freiburg.de/Software/SH3PepInt/SH3PepInt.tar.gz. We also provide a genome-wide prediction for all 70 human SH3 domains, which can be found under http://www.bioinf.uni-freiburg.de/Software/SH3PepInt/Genome-Wide-Predictions.tar.gz.

          Contact: backofen@ 123456informatik.uni-freiburg.de

          Supplementary information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references44

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Combining evidence using p-values: application to sequence homology searches.

            To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              SH3 domains: complexity in moderation.

              B Mayer (2001)
              The SH3 domain is perhaps the best-characterized member of the growing family of protein-interaction modules. By binding with moderate affinity and selectivity to proline-rich ligands, these domains play critical roles in a wide variety of biological processes ranging from regulation of enzymes by intramolecular interactions, increasing the local concentration or altering the subcellular localization of components of signaling pathways, and mediating the assembly of large multiprotein complexes. SH3 domains and their binding sites have cropped up in many hundreds of proteins in species from yeast to man, which suggests that they provide the cell with an especially handy and adaptable means of bringing proteins together. The wealth of genetic, biochemical and structural information available provides an intimate and detailed portrait of the domain, serving as a framework for understanding other modular protein-interaction domains. Processes regulated by SH3 domains also raise important questions about the nature of specificity and the overall logic governing networks of protein interactions.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                1 July 2013
                19 June 2013
                19 June 2013
                : 29
                : 13
                : i335-i343
                Affiliations
                1Bioinformatics Group, Department of Computer Science, Georges-Köhler-Allee 106, 79110 Freiburg, 2Centre for Biological Signalling Studies (BIOSS), 79104 Freiburg, 3Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg im Breisgau, 79104 Freiburg, Germany and 4Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
                Author notes
                *To whom correspondence should be addressed.

                The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

                Article
                btt220
                10.1093/bioinformatics/btt220
                3694653
                23813002
                add45187-c14c-4157-9274-810e6b64c102
                © The Author 2013. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                Page count
                Pages: 9
                Categories
                Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany
                Original Papers
                Sequence Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article