Expansion of Protein Domain Repeats

Björklund, Asa K; Ekman, Diana; Elofsson, Arne

doi:10.1371/journal.pcbi.0020114

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

Expansion of Protein Domain Repeats

research-article

Author(s): Åsa K Björklund , Diana Ekman , Arne Elofsson ^*

Editor(s): Philip E Bourne

Publication date (Electronic): 25 August 2006

Journal: PLoS Computational Biology

Publisher: Public Library of Science

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein–protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.

Synopsis

The building blocks that create proteins are called domains, and domains are often combined to create multidomain proteins. In many vertebrate proteins, repeats with several adjacent domains from the same family can be found. The authors have investigated how these repeats may have evolved. It is believed that the repeats are created through internal duplications where the duplicated region is inserted next to its origin. Therefore, the pairwise sequence similarity between all repeated domains in a protein was used to identify recent duplications, and a method based on autocorrelation vectors was employed to distinguish patterns of duplication. The authors found that repeat regions are often created from the duplication of several domains at a time while duplication of one domain is less common. Further, the internal duplications often occur in the middle of the repeats. This is in contrast to the evolution of nonrepeating, multidomain proteins, which are thought to evolve by the addition of a single domain at the N-termini or C-termini. A preference for duplication of a certain number of domains was found for some of the domain families. Finally, the authors discuss some of the possible mechanisms for repeat expansion. However, the exact mechanism remains to be discovered.

Related collections

Most cited references 28

Record: found
Abstract: not found
Article: not found

Identification of common molecular subsequences.

T.F. Smith, M.S. Waterman (1981)

0 comments Cited 1726 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

TPR proteins: the versatile helix.

Luca D'Andrea, Lynne Regan (2003)

Tetratrico peptide repeat (TPR) proteins have several interesting properties, including their folding characteristics, modular architecture and range of binding specificities. In the past five years, many 3D structures of TPR domains have been solved, revealing at a molecular level the versatility of this basic fold. Here, we discuss the structure of TPRs and highlight the diversity of arrangements and functions that are associated with these ubiquitous domains. Genomic analyses of the distribution of TPR domains are presented along with implications for protein engineering.

0 comments Cited 311 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Pfam: a comprehensive database of protein domain families based on seed alignments.

E. L. L. Sonnhammer, S. R. Eddy, R. Durbin … (1997)

Databases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-A is curated and contains well-characterized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified. We have also identified many novel family memberships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-A families have permanent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences.

0 comments Cited 173 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Comput Biol

Journal ID (publisher-id): pcbi

Title: PLoS Computational Biology

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-734X

ISSN (Electronic): 1553-7358

Publication date (Print): August 2006

Publication date (Electronic): 25 August 2006

Publication date (Electronic preprint): 14 July 2006

Volume: 2

Issue: 8

Electronic Location Identifier: e114

Affiliations

[1]Stockholm Bioinformatics Center, Center for Biomembrane Research, Stockholm University, Stockholm, Sweden

University of California San Diego, United States of America

Author notes

* To whom correspondence should be addressed. E-mail: arne@ 123456bioinfo.se

Article

Publisher ID: 06-PLCB-RA-0043R3 Other ID: e114 Serial Item and Contribution ID: plcb-02-08-13

DOI: 10.1371/journal.pcbi.0020114

PMC ID: 1553488

PubMed ID: 16933986

SO-VID: cde97bb2-ca25-4f47-bb34-2178b8a41ba6

Copyright © Copyright: © 2006 Björklund et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 7 February 2006

Date accepted : 14 July 2006

Page count

Pages: 12

Custom metadata

citation Björklund ÅK, Ekman D, Elofsson A (2006) Expansion of protein domain repeats. PLoS Comput Biol 2(8): 114. DOI: 10.1371/journal.pcbi.0020114

ScienceOpen disciplines: Quantitative & Systems biology

Data availability:

ScienceOpen disciplines: Quantitative & Systems biology

Comments

Comment on this article

scite_

Cited by 84

See all cited by

Most referenced authors 655

See all reference authors

- Version 1

Expansion of Protein Domain Repeats

Read this article at

Abstract

Synopsis

Related collections

Journal of Systems Thinking

Most cited references 28

Identification of common molecular subsequences.

TPR proteins: the versatile helix.

Pfam: a comprehensive database of protein domain families based on seed alignments.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 25

Cited by 84

Most referenced authors 655