Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein–protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.
The building blocks that create proteins are called domains, and domains are often combined to create multidomain proteins. In many vertebrate proteins, repeats with several adjacent domains from the same family can be found. The authors have investigated how these repeats may have evolved. It is believed that the repeats are created through internal duplications where the duplicated region is inserted next to its origin. Therefore, the pairwise sequence similarity between all repeated domains in a protein was used to identify recent duplications, and a method based on autocorrelation vectors was employed to distinguish patterns of duplication. The authors found that repeat regions are often created from the duplication of several domains at a time while duplication of one domain is less common. Further, the internal duplications often occur in the middle of the repeats. This is in contrast to the evolution of nonrepeating, multidomain proteins, which are thought to evolve by the addition of a single domain at the N-termini or C-termini. A preference for duplication of a certain number of domains was found for some of the domain families. Finally, the authors discuss some of the possible mechanisms for repeat expansion. However, the exact mechanism remains to be discovered.