+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale ‘gold standard’ orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity>80%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology.

          Related collections

          Most cited references32

          • Record: found
          • Abstract: found
          • Article: not found

          Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

          Orthologs are genes in different species that originate from a single gene in the last common ancestor of these species. Such genes have often retained identical biological roles in the present-day organisms. It is hence important to identify orthologs for transferring functional information between genes in different organisms with a high degree of reliability. For example, orthologs of human proteins are often functionally characterized in model organisms. Unfortunately, orthology analysis between human and e.g. invertebrates is often complex because of large numbers of paralogs within protein families. Paralogs that predate the species split, which we call out-paralogs, can easily be confused with true orthologs. Paralogs that arose after the species split, which we call in-paralogs, however, are bona fide orthologs by definition. Orthologs and in-paralogs are typically detected with phylogenetic methods, but these are slow and difficult to automate. Automatic clustering methods based on two-way best genome-wide matches on the other hand, have so far not separated in-paralogs from out-paralogs effectively. We present a fully automatic method for finding orthologs and in-paralogs from two species. Ortholog clusters are seeded with a two-way best pairwise match, after which an algorithm for adding in-paralogs is applied. The method bypasses multiple alignments and phylogenetic trees, which can be slow and error-prone steps in classical ortholog detection. Still, it robustly detects complex orthologous relationships and assigns confidence values for both orthologs and in-paralogs. The program, called INPARANOID, was tested on all completely sequenced eukaryotic genomes. To assess the quality of INPARANOID results, ortholog clusters were generated from a dataset of worm and mammalian transmembrane proteins, and were compared to clusters derived by manual tree-based ortholog detection methods. This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.A WWW server that allows searching for orthologs between human and several fully sequenced genomes is installed at http://www.cgb.ki.se/inparanoid/. This is the first comprehensive resource with orthologs of all fully sequenced eukaryotic genomes. Programs and tables of orthology assignments are available from the same location. Copyright 2001 Academic Press.
            • Record: found
            • Abstract: found
            • Article: not found

            Orthologs, paralogs, and evolutionary genomics.

            Orthologs and paralogs are two fundamentally different types of homologous genes that evolved, respectively, by vertical descent from a single ancestral gene and by duplication. Orthology and paralogy are key concepts of evolutionary genomics. A clear distinction between orthologs and paralogs is critical for the construction of a robust evolutionary classification of genes and reliable functional annotation of newly sequenced genomes. Genome comparisons show that orthologous relationships with genes from taxonomically distant species can be established for the majority of the genes from each sequenced genome. This review examines in depth the definitions and subtypes of orthologs and paralogs, outlines the principal methodological approaches employed for identification of orthology and paralogy, and considers evolutionary and functional implications of these concepts.
              • Record: found
              • Abstract: not found
              • Article: not found

              Distinguishing homologous from analogous proteins.


                Author and article information

                Role: Academic Editor
                PLoS ONE
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                18 April 2007
                : 2
                : 4
                : e383
                [1 ]Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
                [2 ]Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
                [3 ]Genomics Institute, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
                [4 ]Department of Methodology and Statistics, Tilburg University, The Netherlands
                Pasteur Institute, France
                Author notes
                * To whom correspondence should be addressed. E-mail: droos@ 123456sas.upenn.edu

                Conceived and designed the experiments: DR FC AM JV. Performed the experiments: FC. Analyzed the data: FC. Wrote the paper: DR FC JV.


                Current address: Informatics, GlaxoSmithKline, Collegeville, Pennsylvania, United States of America

                Chen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                : 6 March 2007
                : 13 March 2007
                Page count
                Pages: 12
                Research Article
                Computational Biology
                Genetics and Genomics
                Computational Biology/Genomics
                Computational Biology/Protein Homology Detection
                Genetics and Genomics/Comparative Genomics
                Genetics and Genomics/Functional Genomics
                Genetics and Genomics/Gene Function
                Genetics and Genomics/Genomics



                Comment on this article