Blog
About

17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Text Mining for Protein Docking

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground ( http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate.

          Author Summary

          Protein interactions are central for many cellular processes. Physical characterization of these interactions is essential for understanding of life processes and applications in biology and medicine. Because of the inherent limitations of experimental techniques and rapid development of computational power and methodology, computer modeling is a tool of choice in many studies. Publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for modeling of proteins and protein complexes. A major paradigm shift in modeling of protein complexes is emerging due to the rapidly expanding amount of such information, which can be used as modeling constraints. Text mining has been widely used in recreating networks of protein interactions, as well as in detecting small molecule binding sites on proteins. Combining and expanding these two well-developed areas of research, we applied the text mining to physical modeling of protein complexes (protein docking). Our procedure retrieves published abstracts on a protein-protein interaction and extracts the relevant information. The results show that correct information on binding can be obtained for about half of protein complexes. The extracted constraints were incorporated in a modeling procedure, significantly improving its performance.

          Related collections

          Most cited references 33

          • Record: found
          • Abstract: found
          • Article: not found

          Text-mining solutions for biomedical research: enabling integrative biology.

          In response to the unbridled growth of information in literature and biomedical databases, researchers require efficient means of handling and extracting information. As well as providing background information for research, scientific publications can be processed to transform textual information into database content or complex networks and can be integrated with existing knowledge resources to suggest novel hypotheses. Information extraction and text data analysis can be particularly relevant and helpful in genetics and biomedical research, in which up-to-date information about complex processes involving genes, proteins and phenotypes is crucial. Here we explore the latest advancements in automated literature analysis and its contribution to innovative research approaches.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The relationship between sequence and interaction divergence in proteins.

            There is currently a gap in knowledge between complexes of known three-dimensional structure and those known from other experimental methods such as affinity purifications or the two-hybrid system. This gap can sometimes be bridged by methods that extrapolate interaction information from one complex structure to homologues of the interacting proteins. To do this, it is important to know if and when proteins of the same type (e.g. family, superfamily or fold) interact in the same way. Here, we study interactions of known structure to address this question. We found all instances within the structural classification of proteins database of the same domain pairs interacting in different complexes, and then compared them with a simple measure (interaction RMSD). When plotted against sequence similarity we find that close homologues (30-40% or higher sequence identity) almost invariably interact the same way. Conversely, similarity only in fold (i.e. without additional evidence for a common ancestor) is only rarely associated with a similarity in interaction. The results suggest that there is a twilight zone of sequence similarity where it is not possible to say whether or not domains will interact similarly. We also discuss the rare instances of fold similarities interacting the same way, and those where obviously homologous proteins interact differently.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Hydrophobicity of amino acid residues in globular proteins.

              During biosynthesis, a globular protein folds into a tight particle with an interior core that is shielded from the surrounding solvent. The hydrophobic effect is thought to play a key role in mediating this process: nonpolar residues expelled from water engender a molecular interior where they can be buried. Paradoxically, results of earlier quantitative analyses have suggested that the tendency for nonpolar residues to be buried within proteins is weak. However, such analyses merely classify residues as either "exposed" or "buried." In the experiment reported in this article proteins of known structure were used to measure the average area that each residue buries upon folding. This characteristic quantity, the average area buried, is correlated with residue hydrophobicity.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                9 December 2015
                December 2015
                : 11
                : 12
                Affiliations
                [1 ]Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
                [2 ]Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, United States of America
                Tel Aviv University, ISRAEL
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: VDB PJK IAV. Performed the experiments: VDB. Analyzed the data: VDB PJK IAV. Contributed reagents/materials/analysis tools: VDB PJK. Wrote the paper: VDB PJK IAV.

                Article
                PCOMPBIOL-D-15-00921
                10.1371/journal.pcbi.1004630
                4674139
                26650466
                © 2015 Badal et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

                Page count
                Figures: 6, Tables: 4, Pages: 21
                Product
                Funding
                This study was supported by NIH grant R01GM074255 and NSF grant DBI1262621. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Custom metadata
                The paper contains a description of how to generate the raw data. The raw data is also available on request from the corresponding author ( vakser@ 123456ku.edu ).

                Quantitative & Systems biology

                Comments

                Comment on this article