19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accurate Quantification of Functional Analogy among Close Homologs

      research-article
      1 , 2 , 3 , *
      PLoS Computational Biology
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Correctly evaluating functional similarities among homologous proteins is necessary for accurate transfer of experimental knowledge from one organism to another, and is of particular importance for the development of animal models of human disease. While the fact that sequence similarity implies functional similarity is a fundamental paradigm of molecular biology, sequence comparison does not directly assess the extent to which two proteins participate in the same biological processes, and has limited utility for analyzing families with several parologous members. Nevertheless, we show that it is possible to provide a cross-organism functional similarity measure in an unbiased way through the exclusive use of high-throughput gene-expression data. Our methodology is based on probabilistic cross-species mapping of functionally analogous proteins based on Bayesian integrative analysis of gene expression compendia. We demonstrate that even among closely related genes, our method is able to predict functionally analogous homolog pairs better than relying on sequence comparison alone. We also demonstrate that the landscape of functional similarity is often complex and that definitive “functional orthologs” do not always exist. Even in these cases, our method and the online interface we provide are designed to allow detailed exploration of sources of inferred functional similarity that can be evaluated by the user.

          Author Summary

          Common ancestry is a central tenet of modern biology, as genes from different species often show a high degree of sequence similarity, making it possible to study analogous processes across model organisms. However, many genes belong to large families with several duplicates and the relationship between genes from different species is often not one-to-one, complicating the transfer of experimental knowledge. We present a method that uses a large compendia of high-throughput expression data, that covers many genes that have not been analyzed in any other way, to systematically predict which genes are most likely to participate in the same biological process and thus have analogous function in different organisms. We show that our method agrees well with current experimental knowledge and we use it to investigate several families of genes that demonstrate the complexity of functional analogy.

          Related collections

          Most cited references38

          • Record: found
          • Abstract: found
          • Article: not found

          Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans.

          A key challenge of functional genomics today is to generate well-annotated data sets that can be interpreted across different platforms and technologies. Large-scale functional genomics data often fail to connect to standard experimental approaches of gene characterization in individual laboratories. Furthermore, a lack of universal annotation standards for phenotypic data sets makes it difficult to compare different screening approaches. Here we address this problem in a screen designed to identify all genes required for the first two rounds of cell division in the Caenorhabditis elegans embryo. We used RNA-mediated interference to target 98% of all genes predicted in the C. elegans genome in combination with differential interference contrast time-lapse microscopy. Through systematic annotation of the resulting movies, we developed a phenotypic profiling system, which shows high correlation with cellular processes and biochemical pathways, thus enabling us to predict new functions for previously uncharacterized genes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria.

            Mitochondria are tailored to meet the metabolic and signaling needs of each cell. To explore its molecular composition, we performed a proteomic survey of mitochondria from mouse brain, heart, kidney, and liver and combined the results with existing gene annotations to produce a list of 591 mitochondrial proteins, including 163 proteins not previously associated with this organelle. The protein expression data were largely concordant with large-scale surveys of RNA abundance and both measures indicate tissue-specific differences in organelle composition. RNA expression profiles across tissues revealed networks of mitochondrial genes that share functional and regulatory mechanisms. We also determined a larger "neighborhood" of genes whose expression is closely correlated to the mitochondrial genes. The combined analysis identifies specific genes of biological interest, such as candidates for mtDNA repair enzymes, offers new insights into the biogenesis and ancestry of mammalian mitochondria, and provides a framework for understanding the organelle's contribution to human disease.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Global alignment of multiple protein interaction networks with application to functional orthology detection.

              Protein-protein interactions (PPIs) and their networks play a central role in all biological processes. Akin to the complete sequencing of genomes and their comparative analysis, complete descriptions of interactomes and their comparative analysis is fundamental to a deeper understanding of biological processes. A first step in such an analysis is to align two or more PPI networks. Here, we introduce an algorithm, IsoRank, for global alignment of multiple PPI networks. The guiding intuition here is that a protein in one PPI network is a good match for a protein in another network if their respective sequences and neighborhood topologies are a good match. We encode this intuition as an eigenvalue problem in a manner analogous to Google's PageRank method. Using IsoRank, we compute a global alignment of the Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Homo sapiens PPI networks. We demonstrate that incorporating PPI data in ortholog prediction results in improvements over existing sequence-only approaches and over predictions from local alignments of the yeast and fly networks. Previous methods have been effective at identifying conserved, localized network patterns across pairs of networks. This work takes the further step of performing a global alignment of multiple PPI networks. It simultaneously uses sequence similarity and network data and, unlike previous approaches, explicitly models the tradeoff inherent in combining them. We expect IsoRank-with its simultaneous handling of node similarity and network similarity-to be applicable across many scientific domains.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                February 2011
                February 2011
                3 February 2011
                : 7
                : 2
                : e1001074
                Affiliations
                [1 ]Department of Molecular Biology, Princeton University, Princeton, New Jersey, United States of America
                [2 ]Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
                [3 ]Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
                New York University, United States of America
                Author notes

                Conceived and designed the experiments: MDC OGT. Performed the experiments: MDC. Analyzed the data: MDC. Wrote the paper: MDC OGT.

                Article
                10-PLCB-RA-2298R2
                10.1371/journal.pcbi.1001074
                3033368
                21304936
                ac08704f-077e-463f-92da-6aa85fdfea80
                Chikina, Troyanskaya. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 4 June 2010
                : 2 January 2011
                Page count
                Pages: 11
                Categories
                Research Article
                Computational Biology/Genomics

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article