701
views
0
recommends
+1 Recommend
0 collections
    6
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Most genes in Arabidopsis thaliana are members of gene families. How do the members of gene families arise, and how are gene family copy numbers maintained? Some gene families may evolve primarily through tandem duplication and high rates of birth and death in clusters, and others through infrequent polyploidy or large-scale segmental duplications and subsequent losses.

          Results

          Our approach to understanding the mechanisms of gene family evolution was to construct phylogenies for 50 large gene families in Arabidopsis thaliana, identify large internal segmental duplications in Arabidopsis, map gene duplications onto the segmental duplications, and use this information to identify which nodes in each phylogeny arose due to segmental or tandem duplication. Examples of six gene families exemplifying characteristic modes are described. Distributions of gene family sizes and patterns of duplication by genomic distance are also described in order to characterize patterns of local duplication and copy number for large gene families. Both gene family size and duplication by distance closely follow power-law distributions.

          Conclusions

          Combining information about genomic segmental duplications, gene family phylogenies, and gene positions provides a method to evaluate contributions of tandem duplication and segmental genome duplication in the generation and maintenance of gene families. These differences appear to correspond meaningfully to differences in functional roles of the members of the gene families.

          Related collections

          Most cited references66

          • Record: found
          • Abstract: found
          • Article: not found

          The Bioperl toolkit: Perl modules for the life sciences.

          The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A draft sequence of the rice genome (Oryza sativa L. ssp. indica).

            J. Yu (2002)
            We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases.

              Plant receptor-like kinases (RLKs) are proteins with a predicted signal sequence, single transmembrane region, and cytoplasmic kinase domain. Receptor-like kinases belong to a large gene family with at least 610 members that represent nearly 2.5% of Arabidopsis protein coding genes. We have categorized members of this family into subfamilies based on both the identity of the extracellular domains and the phylogenetic relationships between the kinase domains of subfamily members. Surprisingly, this structurally defined group of genes is monophyletic with respect to kinase domains when compared with the other eukaryotic kinase families. In an extended analysis, animal receptor kinases, Raf kinases, plant RLKs, and animal receptor tyrosine kinases form a well supported group sharing a common origin within the superfamily of serine/threonine/tyrosine kinases. Among animal kinase sequences, Drosophila Pelle and related cytoplasmic kinases fall within the plant RLK clade, which we now define as the RLK/Pelle family. A survey of expressed sequence tag records for land plants reveals that mosses, ferns, conifers, and flowering plants have similar percentages of expressed sequence tags representing RLK/Pelle homologs, suggesting that the size of this gene family may have been close to the present-day level before the diversification of land plant lineages. The distribution pattern of four RLK subfamilies on Arabidopsis chromosomes indicates that the expansion of this gene family is partly a consequence of duplication and reshuffling of the Arabidopsis genome and of the generation of tandem repeats.
                Bookmark

                Author and article information

                Journal
                BMC Plant Biol
                BMC Plant Biology
                BioMed Central (London )
                1471-2229
                2004
                1 June 2004
                : 4
                : 10
                Affiliations
                [1 ]Plant Biology Department, University of Minnesota, St. Paul, MN 55108, USA
                [2 ]Plant Pathology Department, University of Minnesota, St. Paul, MN 55108, USA
                [3 ]Adam Ave 532, Ithaca, NY 14850, USA
                [4 ]Ecology, Evolution, and Behavior Department, University of Minnesota, St. Paul, MN 55108, USA
                Article
                1471-2229-4-10
                10.1186/1471-2229-4-10
                446195
                15171794
                78150693-88d0-48b6-9d26-7448960ab387
                Copyright © 2004 Cannon et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
                History
                : 1 November 2003
                : 1 June 2004
                Categories
                Research Article

                Plant science & Botany
                Plant science & Botany

                Comments

                Comment on this article