2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families.

          Findings

          A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow.

          Conclusions

          GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data

          The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The PANTHER database of protein families, subfamilies, functions and pathways

            PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertise. These subfamilies model the divergence of specific functions within protein families, allowing more accurate association with function (ontology terms and pathways), as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences. The latest version, 5.0, contains 6683 protein families, divided into 31 705 subfamilies, covering ∼90% of mammalian protein-coding genes. PANTHER 5.0 includes a number of significant improvements over previous versions, most notably (i) representation of pathways (primarily signaling pathways) and association with subfamilies and individual protein sequences; (ii) an improved methodology for defining the PANTHER families and subfamilies, and for building the HMMs; (iii) resources for scoring sequences against PANTHER HMMs both over the web and locally; and (iv) a number of new web resources to facilitate analysis of large gene lists, including data generated from high-throughput expression experiments. Efforts are underway to add PANTHER to the InterPro suite of databases, and to make PANTHER consistent with the PIRSF database. PANTHER is now publicly available without restriction at http://panther.appliedbiosystems.com.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Functional and evolutionary implications of gene orthology.

              Orthologues and paralogues are types of homologous genes that are related by speciation or duplication, respectively. Orthologous genes are generally assumed to retain equivalent functions in different organisms and to share other key properties. Several recent comparative genomic studies have focused on testing these expectations. Here we discuss the complexity of the evolution of gene-phenotype relationships and assess the validity of the key implications of orthology and paralogy relationships as general statistical trends and guiding principles.
                Bookmark

                Author and article information

                Journal
                Gigascience
                Gigascience
                gigascience
                GigaScience
                Oxford University Press
                2047-217X
                March 2018
                07 February 2018
                07 February 2018
                : 7
                : 3
                : 1-10
                Affiliations
                [1]Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
                Author notes
                Correspondence address. Anil S. Thanki, Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK. E-mail: Anil.Thanki@ 123456earlham.ac.uk
                Author information
                http://orcid.org/0000-0002-8941-444X
                http://orcid.org/0000-0003-3627-5340
                http://orcid.org/0000-0003-0111-191X
                http://orcid.org/0000-0002-5589-7754
                Article
                giy005
                10.1093/gigascience/giy005
                5863215
                29425291
                81a33e3d-84aa-4cd8-af95-41d346d79ed4
                © The Authors 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 18 January 2018
                : 30 March 2017
                : 31 July 2017
                Page count
                Pages: 10
                Categories
                Technical Note

                galaxy,pipeline,workflow,genomics,comparative genomics,homology,orthology,paralogy,phylogeny,gene family,alignment,compara,ensembl

                Comments

                Comment on this article