Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Evolution of Protein Domain Repeats in Metazoa

, *

Molecular Biology and Evolution

Oxford University Press

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races.

      Key Words: protein evolution, domain rearrangements, protein repeats, concerted evolution.

      Related collections

      Most cited references 97

      • Record: found
      • Abstract: found
      • Article: not found

      MUSCLE: multiple sequence alignment with high accuracy and high throughput.

       Robert Edgar (2004)
      We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        The Protein Data Bank.

        The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          T-Coffee: A novel method for fast and accurate multiple sequence alignment.

          We describe a new method (T-Coffee) for multiple sequence alignment that provides a dramatic improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives. The method is broadly based on the popular progressive approach to multiple alignment but avoids the most serious pitfalls caused by the greedy nature of this algorithm. With T-Coffee we pre-process a data set of all pair-wise alignments between the sequences. This provides us with a library of alignment information that can be used to guide the progressive alignment. Intermediate alignments are then based not only on the sequences to be aligned next but also on how all of the sequences align with each other. This alignment information can be derived from heterogeneous sources such as a mixture of alignment programs and/or structure superposition. Here, we illustrate the power of the approach by using a combination of local and global pair-wise alignments to generate the library. The resulting alignments are significantly more reliable, as determined by comparison with a set of 141 test cases, than any of the popular alternatives that we tried. The improvement, especially clear with the more difficult test cases, is always visible, regardless of the phylogenetic spread of the sequences in the tests. Copyright 2000 Academic Press.
            Bookmark

            Author and article information

            Affiliations
            Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
            Author notes
            * Corresponding author: E-mail: ebb@ 123456wwu.de .

            Associate editor: Jeffery Thorne

            Journal
            Mol Biol Evol
            Mol. Biol. Evol
            molbev
            molbiolevol
            Molecular Biology and Evolution
            Oxford University Press
            0737-4038
            1537-1719
            December 2016
            26 September 2016
            26 September 2016
            : 33
            : 12
            : 3170-3182
            27671125 5100051 10.1093/molbev/msw194 msw194
            © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

            This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

            Counts
            Pages: 13
            Categories
            Discoveries

            Molecular biology

            Comments

            Comment on this article