154
views
0
recommends
+1 Recommend
0 collections
    12
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome

      research-article
      1 , 2 , 2 , , 1 , 3 ,
      Genome Biology
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In order to consolidate the known human proteins interactions two tests were developed to measure the relative accuracy of the available interaction data. In addition, 6,580 interactions among 3,737 human proteins were recovered from Medline abstracts and combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing data sets.

          Abstract

          Background

          Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins.

          Results

          We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets.

          Conclusion

          These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.

            Two large-scale yeast two-hybrid screens were undertaken to identify protein-protein interactions between full-length open reading frames predicted from the Saccharomyces cerevisiae genome sequence. In one approach, we constructed a protein array of about 6,000 yeast transformants, with each transformant expressing one of the open reading frames as a fusion to an activation domain. This array was screened by a simple and automated procedure for 192 yeast proteins, with positive responses identified by their positions in the array. In a second approach, we pooled cells expressing one of about 6,000 activation domain fusions to generate a library. We used a high-throughput screening procedure to screen nearly all of the 6,000 predicted yeast proteins, expressed as Gal4 DNA-binding domain fusion proteins, against the library, and characterized positives by sequence analysis. These approaches resulted in the detection of 957 putative interactions involving 1,004 S. cerevisiae proteins. These data reveal interactions that place functionally unclassified proteins in a biological context, interactions between proteins involved in the same biological function, and interactions that link biological functions together into larger cellular processes. The results of these screens are shown here.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A comprehensive two-hybrid analysis to explore the yeast protein interactome.

              Protein-protein interactions play crucial roles in the execution of various biological functions. Accordingly, their comprehensive description would contribute considerably to the functional interpretation of fully sequenced genomes, which are flooded with novel genes of unpredictable functions. We previously developed a system to examine two-hybrid interactions in all possible combinations between the approximately 6,000 proteins of the budding yeast Saccharomyces cerevisiae. Here we have completed the comprehensive analysis using this system to identify 4,549 two-hybrid interactions among 3,278 proteins. Unexpectedly, these data do not largely overlap with those obtained by the other project [Uetz, P., et al. (2000) Nature (London) 403, 623-627] and hence have substantially expanded our knowledge on the protein interaction space or interactome of the yeast. Cumulative connection of these binary interactions generates a single huge network linking the vast majority of the proteins. Bioinformatics-aided selection of biologically relevant interactions highlights various intriguing subnetworks. They include, for instance, the one that had successfully foreseen the involvement of a novel protein in spindle pole body function as well as the one that may uncover a hitherto unidentified multiprotein complex potentially participating in the process of vesicular transport. Our data would thus significantly expand and improve the protein interaction map for the exploration of genome functions that eventually leads to thorough understanding of the cell as a molecular system.
                Bookmark

                Author and article information

                Journal
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1465-6906
                1465-6914
                2005
                15 April 2005
                : 6
                : 5
                : R40
                Affiliations
                [1 ]Center for Systems and Synthetic Biology and Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA
                [2 ]Department of Computer Sciences, University of Texas, Austin, TX 78712, USA
                [3 ]Department of Chemistry and Biochemistry, University of Texas, Austin, TX 78712, USA
                Article
                gb-2005-6-5-r40
                10.1186/gb-2005-6-5-r40
                1175952
                15892868
                86ad4f83-9e70-45db-8822-4e4c80aa9703
                Copyright © 2005 Marcotte et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 20 December 2004
                : 9 February 2005
                : 11 March 2005
                Categories
                Research

                Genetics
                Genetics

                Comments

                Comment on this article