+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Expansion of Protein Domain Repeats

      , , *

      PLoS Computational Biology

      Public Library of Science

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein–protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.


          The building blocks that create proteins are called domains, and domains are often combined to create multidomain proteins. In many vertebrate proteins, repeats with several adjacent domains from the same family can be found. The authors have investigated how these repeats may have evolved. It is believed that the repeats are created through internal duplications where the duplicated region is inserted next to its origin. Therefore, the pairwise sequence similarity between all repeated domains in a protein was used to identify recent duplications, and a method based on autocorrelation vectors was employed to distinguish patterns of duplication. The authors found that repeat regions are often created from the duplication of several domains at a time while duplication of one domain is less common. Further, the internal duplications often occur in the middle of the repeats. This is in contrast to the evolution of nonrepeating, multidomain proteins, which are thought to evolve by the addition of a single domain at the N-termini or C-termini. A preference for duplication of a certain number of domains was found for some of the domain families. Finally, the authors discuss some of the possible mechanisms for repeat expansion. However, the exact mechanism remains to be discovered.

          Related collections

          Most cited references 28

          • Record: found
          • Abstract: not found
          • Article: not found

          Identification of common molecular subsequences.

            • Record: found
            • Abstract: found
            • Article: not found

            TPR proteins: the versatile helix.

            Tetratrico peptide repeat (TPR) proteins have several interesting properties, including their folding characteristics, modular architecture and range of binding specificities. In the past five years, many 3D structures of TPR domains have been solved, revealing at a molecular level the versatility of this basic fold. Here, we discuss the structure of TPRs and highlight the diversity of arrangements and functions that are associated with these ubiquitous domains. Genomic analyses of the distribution of TPR domains are presented along with implications for protein engineering.
              • Record: found
              • Abstract: found
              • Article: not found

              Pfam: a comprehensive database of protein domain families based on seed alignments.

              Databases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-A is curated and contains well-characterized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified. We have also identified many novel family memberships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-A families have permanent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences.

                Author and article information

                Role: Editor
                PLoS Comput Biol
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                August 2006
                25 August 2006
                14 July 2006
                : 2
                : 8
                Stockholm Bioinformatics Center, Center for Biomembrane Research, Stockholm University, Stockholm, Sweden
                University of California San Diego, United States of America
                Author notes
                * To whom correspondence should be addressed. E-mail: arne@
                06-PLCB-RA-0043R3 e114 plcb-02-08-13
                Copyright: © 2006 Björklund et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                Page count
                Pages: 12
                Research Article
                Bioinformatics - Computational Biology
                Molecular Biology - Structural Biology
                Custom metadata
                Björklund ÅK, Ekman D, Elofsson A (2006) Expansion of protein domain repeats. PLoS Comput Biol 2(8): 114. DOI: 10.1371/journal.pcbi.0020114

                Quantitative & Systems biology


                Comment on this article