+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods

      * ,
      PLoS Computational Biology
      Public Library of Science

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify orthology. And third, it sets performance standards for current and future approaches.

          Author Summary

          The identification of orthologs, pairs of homologous genes in different species that started diverging through speciation events, is a central problem in genomics with applications in many research areas, including comparative genomics, phylogenetics, protein function annotation, and genome rearrangement. An increasing number of projects aim at inferring orthologs from complete genomes, but little is known about their relative accuracy or coverage. Because the exact evolutionary history of entire genomes remains largely unknown, predictions can only be validated indirectly, that is, in the context of the different applications of orthology. The few comparison studies published so far have asssessed orthology exclusively from the expectation that orthologs have conserved protein function. In the present work, we introduce methodology to verify orthology in terms of phylogeny and perform a comprehensive comparison of nine leading ortholog inference projects and two methods using both phylogenetic and functional tests. The results show large variations among the different projects in terms of performances, which indicates that the choice of orthology database can have a strong impact on any downstream analysis.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: not found
          • Article: not found

          Identification of common molecular subsequences.

            • Record: found
            • Abstract: found
            • Article: not found

            Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

            Orthologs are genes in different species that originate from a single gene in the last common ancestor of these species. Such genes have often retained identical biological roles in the present-day organisms. It is hence important to identify orthologs for transferring functional information between genes in different organisms with a high degree of reliability. For example, orthologs of human proteins are often functionally characterized in model organisms. Unfortunately, orthology analysis between human and e.g. invertebrates is often complex because of large numbers of paralogs within protein families. Paralogs that predate the species split, which we call out-paralogs, can easily be confused with true orthologs. Paralogs that arose after the species split, which we call in-paralogs, however, are bona fide orthologs by definition. Orthologs and in-paralogs are typically detected with phylogenetic methods, but these are slow and difficult to automate. Automatic clustering methods based on two-way best genome-wide matches on the other hand, have so far not separated in-paralogs from out-paralogs effectively. We present a fully automatic method for finding orthologs and in-paralogs from two species. Ortholog clusters are seeded with a two-way best pairwise match, after which an algorithm for adding in-paralogs is applied. The method bypasses multiple alignments and phylogenetic trees, which can be slow and error-prone steps in classical ortholog detection. Still, it robustly detects complex orthologous relationships and assigns confidence values for both orthologs and in-paralogs. The program, called INPARANOID, was tested on all completely sequenced eukaryotic genomes. To assess the quality of INPARANOID results, ortholog clusters were generated from a dataset of worm and mammalian transmembrane proteins, and were compared to clusters derived by manual tree-based ortholog detection methods. This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.A WWW server that allows searching for orthologs between human and several fully sequenced genomes is installed at http://www.cgb.ki.se/inparanoid/. This is the first comprehensive resource with orthologs of all fully sequenced eukaryotic genomes. Programs and tables of orthology assignments are available from the same location. Copyright 2001 Academic Press.
              • Record: found
              • Abstract: not found
              • Article: not found

              Distinguishing homologous from analogous proteins.


                Author and article information

                Role: Editor
                PLoS Comput Biol
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                January 2009
                January 2009
                16 January 2009
                : 5
                : 1
                : e1000262
                [1]Institute of Computational Science, ETH Zurich, and Swiss Institute of Bioinformatics, Zürich, Switzerland
                University of California Davis, United States of America
                Author notes

                Conceived and designed the experiments: CD. Performed the experiments: AMA. Analyzed the data: AMA CD. Wrote the paper: AMA CD.

                Altenhoff, Dessimoz. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                : 21 May 2008
                : 26 November 2008
                Page count
                Pages: 11
                Research Article
                Computational Biology/Comparative Sequence Analysis
                Computational Biology/Genomics
                Computational Biology/Protein Homology Detection
                Evolutionary Biology/Bioinformatics
                Evolutionary Biology/Genomics
                Genetics and Genomics/Comparative Genomics
                Genetics and Genomics/Functional Genomics
                Genetics and Genomics/Genomics

                Quantitative & Systems biology
                Quantitative & Systems biology


                Comment on this article