9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity.

          Results

          In this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs.

          Conclusions

          The study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12862-016-0684-2) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: not found
          • Article: not found

          Identification of common molecular subsequences.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Orthologs, paralogs, and evolutionary genomics.

            Orthologs and paralogs are two fundamentally different types of homologous genes that evolved, respectively, by vertical descent from a single ancestral gene and by duplication. Orthology and paralogy are key concepts of evolutionary genomics. A clear distinction between orthologs and paralogs is critical for the construction of a robust evolutionary classification of genes and reliable functional annotation of newly sequenced genomes. Genome comparisons show that orthologous relationships with genes from taxonomically distant species can be established for the majority of the genes from each sequenced genome. This review examines in depth the definitions and subtypes of orthologs and paralogs, outlines the principal methodological approaches employed for identification of orthology and paralogy, and considers evolutionary and functional implications of these concepts.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Distinguishing homologous from analogous proteins.

                Bookmark

                Author and article information

                Contributors
                rhali@kth.se
                auwnm@kth.se
                arve@nada.su.se
                Journal
                BMC Evol Biol
                BMC Evol. Biol
                BMC Evolutionary Biology
                BioMed Central (London )
                1471-2148
                4 June 2016
                4 June 2016
                2016
                : 16
                : 120
                Affiliations
                [ ]KTH Royal Institute of Technology, Science for Life Laboratory, School of Computer Science and Communication, Solna, SE-171 77 Sweden
                [ ]Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, SE-100 44 Sweden
                [ ]Swedish e-Science Research Centre, Stockholm, Sweden
                [ ]Science for Life Laboratory, Box 1031, Solna, SE-171 77 Sweden
                Article
                684
                10.1186/s12862-016-0684-2
                4893229
                27260514
                5e575b9e-6b97-4e65-b6c1-95b0f8447287
                © Ali et al. 2016

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 2 December 2015
                : 12 May 2016
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2016

                Evolutionary Biology
                homology inference,gene synteny,gene similarity,gene family,clustering,gene order conservation

                Comments

                Comment on this article