34
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Development of a Machine Learning Method to Predict Membrane Protein-Ligand Binding Residues Using Basic Sequence Information

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Locating ligand binding sites and finding the functionally important residues from protein sequences as well as structures became one of the challenges in understanding their function. Hence a Naïve Bayes classifier has been trained to predict whether a given amino acid residue in membrane protein sequence is a ligand binding residue or not using only sequence based information. The input to the classifier consists of the features of the target residue and two sequence neighbors on each side of the target residue. The classifier is trained and evaluated on a nonredundant set of 42 sequences (chains with at least one transmembrane domain) from 31 alpha-helical membrane proteins. The classifier achieves an overall accuracy of 70.7% with 72.5% specificity and 61.1% sensitivity in identifying ligand binding residues from sequence. The classifier performs better when the sequence is encoded by psi-blast generated PSSM profiles. Assessment of the predictions in the context of three-dimensional structures of proteins reveals the effectiveness of this method in identifying ligand binding sites from sequence information. In 83.3% (35 out of 42) of the proteins, the classifier identifies the ligand binding sites by correctly recognizing more than half of the binding residues. This will be useful to protein engineers in exploiting potential residues for functional assessment.

          Related collections

          Most cited references55

          • Record: found
          • Abstract: found
          • Article: not found

          Predicting functionally important residues from sequence conservation.

          All residues in a protein are not equally important. Some are essential for the proper structure and function of the protein, whereas others can be readily replaced. Conservation analysis is one of the most widely used methods for predicting these functionally important residues in protein sequences. We introduce an information-theoretic approach for estimating sequence conservation based on Jensen-Shannon divergence. We also develop a general heuristic that considers the estimated conservation of sequentially neighboring sites. In large-scale testing, we demonstrate that our combined approach outperforms previous conservation-based measures in identifying functionally important residues; in particular, it is significantly better than the commonly used Shannon entropy measure. We find that considering conservation at sequential neighbors improves the performance of all methods tested. Our analysis also reveals that many existing methods that attempt to incorporate the relationships between amino acids do not lead to better identification of functionally important sites. Finally, we find that while conservation is highly predictive in identifying catalytic sites and residues near bound ligands, it is much less effective in identifying residues in protein-protein interfaces. Data sets and code for all conservation measures evaluated are available at http://compbio.cs.princeton.edu/conservation/
            • Record: found
            • Abstract: found
            • Article: not found

            Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms.

            We have carried out detailed statistical analyses of integral membrane proteins of the helix-bundle class from eubacterial, archaean, and eukaryotic organisms for which genome-wide sequence data are available. Twenty to 30% of all ORFs are predicted to encode membrane proteins, with the larger genomes containing a higher fraction than the smaller ones. Although there is a general tendency that proteins with a smaller number of transmembrane segments are more prevalent than those with many, uni-cellular organisms appear to prefer proteins with 6 and 12 transmembrane segments, whereas Caenorhabditis elegans and Homo sapiens have a slight preference for proteins with seven transmembrane segments. In all organisms, there is a tendency that membrane proteins either have many transmembrane segments with short connecting loops or few transmembrane segments with large extra-membraneous domains. Membrane proteins from all organisms studied, except possibly the archaeon Methanococcus jannaschii, follow the so-called "positive-inside" rule; i.e., they tend to have a higher frequency of positively charged residues in cytoplasmic than in extra-cytoplasmic segments.
              • Record: found
              • Abstract: not found
              • Article: not found

              Drug Discovery: A Historical Perspective

              J. Drews (2000)

                Author and article information

                Journal
                Adv Bioinformatics
                Adv Bioinformatics
                ABI
                Advances in Bioinformatics
                Hindawi Publishing Corporation
                1687-8027
                1687-8035
                2015
                31 January 2015
                : 2015
                : 843030
                Affiliations
                1Department of Bioinformatics, Sathyabama University, Chennai 600119, India
                2Department of Biotechnology, IIT Madras, Chennai 600032, India
                3Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
                Author notes

                Academic Editor: Paul Harrison

                Article
                10.1155/2015/843030
                4329842
                25802517
                a46dcd70-02fa-4245-90f0-8cd93fe0d1b5
                Copyright © 2015 M. Xavier Suresh et al.

                This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 28 June 2014
                : 7 January 2015
                : 8 January 2015
                Categories
                Research Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article

                Related Documents Log