31
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Evolution of C2H2-zinc finger genes and subfamilies in mammals: Species-specific duplication and loss of clusters, genes and effector domains

      research-article
      1 , 1 , 1 ,
      BMC Evolutionary Biology
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          C2H2 zinc finger genes (C2H2-ZNF) constitute the largest class of transcription factors in humans and one of the largest gene families in mammals. Often arranged in clusters in the genome, these genes are thought to have undergone a massive expansion in vertebrates, primarily by tandem duplication. However, this view is based on limited datasets restricted to a single chromosome or a specific subset of genes belonging to the large KRAB domain-containing C2H2-ZNF subfamily.

          Results

          Here, we present the first comprehensive study of the evolution of the C2H2-ZNF family in mammals. We assembled the complete repertoire of human C2H2-ZNF genes (718 in total), about 70% of which are organized into 81 clusters across all chromosomes. Based on an analysis of their N-terminal effector domains, we identified two new C2H2-ZNF subfamilies encoding genes with a SET or a HOMEO domain. We searched for the syntenic counterparts of the human clusters in other mammals for which complete gene data are available: chimpanzee, mouse, rat and dog. Cross-species comparisons show a large variation in the numbers of C2H2-ZNF genes within homologous mammalian clusters, suggesting differential patterns of evolution. Phylogenetic analysis of selected clusters reveals that the disparity in C2H2-ZNF gene repertoires across mammals not only originates from differential gene duplication but also from gene loss. Further, we discovered variations among orthologs in the number of zinc finger motifs and association of the effector domains, the latter often undergoing sequence degeneration. Combined with phylogenetic studies, physical maps and an analysis of the exon-intron organization of genes from the SCAN and KRAB domains-containing subfamilies, this result suggests that the SCAN subfamily emerged first, followed by the SCAN-KRAB and finally by the KRAB subfamily.

          Conclusion

          Our results are in agreement with the "birth and death hypothesis" for the evolution of C2H2-ZNF genes, but also show that this hypothesis alone cannot explain the considerable evolutionary variation within the subfamilies of these genes in mammals. We, therefore, propose a new model involving the interdependent evolution of C2H2-ZNF gene subfamilies.

          Related collections

          Most cited references47

          • Record: found
          • Abstract: found
          • Article: not found

          Evolution by the birth-and-death process in multigene families of the vertebrate immune system.

          Concerted evolution is often invoked to explain the diversity and evolution of the multigene families of major histocompatibility complex (MHC) genes and immunoglobulin (Ig) genes. However, this hypothesis has been controversial because the member genes of these families from the same species are not necessarily more closely related to one another than to the genes from different species. To resolve this controversy, we conducted phylogenetic analyses of several multigene families of the MHC and Ig systems. The results show that the evolutionary pattern of these families is quite different from that of concerted evolution but is in agreement with the birth-and-death model of evolution in which new genes are created by repeated gene duplication and some duplicate genes are maintained in the genome for a long time but others are deleted or become nonfunctional by deleterious mutations. We found little evidence that interlocus gene conversion plays an important role in the evolution of MHC and Ig multigene families.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes.

            The 7S particle of Xenopus laevis oocytes contains 5S RNA and a 40-K protein which is required for 5S RNA transcription in vitro. Proteolytic digestion of the protein in the particle yields periodic intermediates spaced at 3-K intervals and a limit digest containing 3-K fragments. The native particle is shown to contain 7-11 zinc atoms. These data suggest that the protein contains repetitive zinc-binding domains. Analysis of the amino acid sequence reveals nine tandem similar units, each consisting of approximately 30 residues and containing two invariant pairs of cysteines and histidines, the most common ligands for zinc. The linear arrangement of these repeated, independently folding domains, each centred on a zinc ion, comprises the major part of the protein. Such a structure explains how this small protein can bind to the long internal control region of the 5S RNA gene, and stay bound during the passage of an RNA polymerase molecule.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              KRAB-containing zinc-finger repressor proteins

              Gene organization and evolutionary history Zinc-finger proteins containing the Krüppel-associated box (KRAB-containing proteins) were discovered in 1991 by Bellefroid et al. [1]. They make up approximately one third (290) of the 799 different zinc-finger proteins present in the human genome, and as a result, this group of proteins is the largest single family of transcriptional regulators in mammals. Many genes encoding KRAB-containing proteins are arranged in clusters, but others occur individually throughout the genome. The best characterized cluster is on 19q, containing 148 genes (51% of the family) within a region close to 19q13 [2]; other clusters are in centromeric and telomeric regions of other chromosomes. In particular, members of the family containing SCAN domains (see below) are clustered on 3p21-22, 6p21-22, 16p13.3, and 17p12-13. Non-clustered genes encoding KRAB-containing proteins are scattered over the other chromosomes, with about half on autosomes and half on sex chromosomes. Although the expression of genes of other clustered families, such as homeobox genes, is coregulated, it remains to be determined whether a comparable mechanism operates for genes encoding KRAB-containing proteins, and more studies are needed to show how chromosome organization influences the expression patterns of this family. As shown in Figure 1, KRAB-containing proteins are characterized by the presence of a DNA-binding domain made up of between 4 and over 30 zinc-finger motifs and a KRAB domain. The KRAB domain, located near the amino terminus of the protein, consists of one or both of the KRAB A box and the KRAB B box (see below). Other domains, such as the SCAN domain, are found in a small subset of members of the family [2,3] (Table 1). The two boxes of the KRAB domain are always encoded by individual exons separated by introns of variable sizes. This exon-intron composition allows the generation of different products by alternative splicing. In fact, zinc-finger proteins that contain only a KRAB A domain, for instance, can originate either from a gene that lacks the KRAB B domain or from one with both KRAB A and B that generates a 'KRAB A-only' transcript by alternative splicing. In contrast, the zinc-finger domain (including all the zinc-finger motifs) is often encoded by a single exon. This is remarkable given that other families of zinc-finger proteins containing fewer zinc fingers (such as the Sp1-like proteins, which have three) have more than one exon to encode the DNA-binding domain. Multi-zinc-finger proteins of the KRAB-containing protein family may have been subjected to different selective pressures from proteins with fewer zinc fingers; this idea is supported by other evolutionary features, discussed below. Perhaps the most remarkable feature of the KRAB-containing proteins is the fact that they are present only in tetrapod vertebrate genomes. The KRAB domain is absent from the sequences of zinc-finger proteins from fish, Drosophila, plants, yeast, and other fungi, but it has been identified in the human, mouse, rat, chicken and frog genomes [3]. Although the name 'Krüppel-associated box' implies that the KRAB domain is present in proteins that have zinc fingers similar to the ones found in Drosophila Krüppel, Krüppel itself does not have a KRAB box. This distribution suggests that the emergence of the KRAB domain is a relatively recent event in evolution, even though a large part of each KRAB-containing protein is composed of zinc-finger motifs, which are present in organisms ranging from unicellular eukaryotes to humans. Currently, the reason for the expansion of the family in tetrapods remains unknown, although clues may come from a better understanding of their transcriptional-regulatory functions. It is likely, however, that they evolved to provide vertebrates with a key function that underlies their development, such as aspects of the immune system or the nervous system. Characteristic structural features Members of the KRAB-containing protein family bind DNA through their C2H2 zinc-finger domains [3], and the KRAB domain functions as a strong transcriptional repressor domain [4]. Some members of the family also have SCAN domains. No crystal structures of KRAB-containing proteins have yet been solved. Zinc-fingers The C2H2 zinc finger motifs found in the KRAB-containing proteins and other zinc-finger proteins are defined by the presence of the consensus sequence φ-X-Cys-X(2-4)-Cys-X3-φ-X5-φX2-His-X(3,4)-His, where X represents any amino acid and φ represents a hydrophobic residue. The two cys-teine and two histidine residues coordinate a zinc ion and fold the domain into a finger-like projection that can interact with DNA. Previous studies strongly suggest that each of these motifs can contact three to four nucleotides [5]. KRAB-containing proteins often contain 10 or more zinc fingers, and proteins with up to 34 are known. Until recently, it had not been investigated fully whether these zinc fingers bind DNA in a sequence-specific manner or function in transcriptional regulation outside of an artificial Gal4-based transcriptional assay. During the last two years, however, our laboratory and others have provided evidence that wild-type KRAB-containing proteins are indeed transcriptional repressors that use most of their collection of zinc fingers to bind to DNA [5]. In theory, proteins with 30 zinc-finger domains would bind a DNA sequence of more than 60 nucleotides. A sequence of this length would be rarely found by chance in the relatively small genomes of lower eukaryotes, consistent with the fact that KRAB-containing proteins are found only in tetrapods. One should be cautious, however, in assuming that KRAB-containing proteins always bind such long sequences, as post-translational modifications and hetero-dimerization with other proteins could potentially modify their binding capabilities so as to enable them to recognize shorter sequences. As studies describing DNA binding by these proteins is scant, the final answers to these provocative hypotheses will rely on further studies. The KRAB domain The KRAB domain spans approximately 50-75 amino acids and is divided into the A and B boxes (Figure 2a); the A box plays a key role in repression by binding to corepressors, and the B box enhances the repression meditated by the A box through as-yet unknown mechanisms [6]. Whether or not the amino-terminal domain contains the A box, the B box, or both, it is always known as the KRAB domain (Figure 2a). The mammalian KRAB-containing zinc-finger proteins can be divided into three subfamilies on the basis of the primary structure of this amino-terminal repressor domain [3]: those that contain an A box alone (the KRAB A subfamily), those with a combination of the A and B boxes (KRAB A + B), and those with an A box combined with a divergent B box, sometimes called the b box (KRAB A + b). Further analysis of the family may reveal other subfamilies. A conserved motif in another family of mammalian proteins, the SSX proteins, has a low degree of similarity with the KRAB domain. Proteins containing the 'SXX KRAB domain' sequence do not have zinc fingers and are not grouped into the KRAB-containing protein family [7]. Functional analyses have been important in dissecting the functional differences between the SSX and KRAB domains, which are 39 to 49% similar to each other [7]: SSX-KRAB-related domains poorly repress heterologous promoters and do not interact with Kap1 (see below). The SCAN domain A defined subset of KRAB-containing zinc-finger transcription factors contains a SCAN domain, which is named after the first letters of the proteins in which it was originally described (SRE-ZBP, CTfin51, AW-1, and Number 18 cDNA) [8]; it is also known as LeR because of its leucine-rich primary structure. The SCAN domain is at least 87 amino acids in length (Figure 2b); it is vertebrate-specific, and it is never repeated within a protein. It is not associated with transcriptional regulation but instead allows homo- and hetero-dimerization with other SCAN-containing zinc-finger proteins [9]; the mechanisms involved in these dimerization phenomena remain poorly understood. Taken together, the reduced number of genes encoding these proteins in mammals, their clustered genomic organization, and their ability to form dimers suggest that KRAB-containing zinc-finger proteins with SCAN domains may either all participate in similar functional processes or all be regulated in a similar manner. Localization and function The functions currently known for members of the KRAB-containing protein family include transcriptional repression of RNA polymerase I, II, and III promoters, binding and splicing of RNA, and control of nucleolus function. The functions of most of the family have not been well studied, but a few examples are as follows. The human Kid1 protein can bind to heteroduplex DNA structures and is localized to the nucleolus [10]. Once in the nucleolus, Kid1 induces nucleolar disintegration and greatly reduces the synthesis of ribosomal RNA by RNA polymerase I, which takes place in this sub-nuclear compartment. Moreover, the KRAB domain of Kid1 is necessary for both of these phenomena, suggesting that the protein may repress transcription by RNA polymerase I. Because the number and size of the nucleolus correlates with the activity level of RNA polymerase I, its repression may contribute to the disintegration of the nucleolus. Interestingly, however, the KRAB domain of Kox1, which has the same domain structure as Kid1 and therefore belongs to the same subfamily, cannot repress transcription by RNA polymerase I in Gal4-based assays [11]. Thus, it is likely that the KRAB domain functions differently in the full-length Kid1 protein than in a chimeric fusion protein (as used in the Gal4 assay) or that the KRAB domains of Kox1 and Kid1 behave differently at RNA polymerase I promoters. More studies are needed to differentiate between these possibilities. In contrast to Kid-1, human Znf74 is found in discrete granular structures in the nucleus, is tightly associated with the nuclear matrix, binds to RNA, and interacts with RNA polymerase II [12]. This KRAB-containing protein contains a truncated KRAB A domain and 12 different C2H2 zinc-finger motifs that are sufficient for targeting the protein to the nuclear matrix as well as for RNA binding. In addition, Znf74 interacts with the hyperphosphorylated form of RNA polymerase II and colocalizes with it in nuclear domains that are enriched in splicing factors. These findings suggest that Znf74 may regulate gene expression through both transcriptional and post-transcriptional mechanisms. KS1, which has ten zinc-finger domains and both KRAB A and B boxes, is a strong repressor of RNA polymerase activity by the Kap1-mediated mechanism described below [5]. KS1 is also a suppressor of the neoplastic transformation that is mediated by several oncogenes [13]. The biochemical functions of KRAB-containing proteins described above are thought to be critical to their cellular roles, which include cell differentiation, cell proliferation, apoptosis, and neoplastic transformation. Krim-1B, a KRAB-containing protein with nine zinc-finger motifs, antagonizes the growth regulatory properties of the oncogene product c-Myc by binding to it via the second zinc finger [14]. The interaction between Krim-1B and c-Myc decreases the transcriptional transactivation of c-Myc that is dependent on c-Myc binding to the E-box in the promoters of its target genes. Other KRAB-containing proteins are involved in the regulation of cell proliferation. The leucine zipper and sterile-alpha motif protein kinase (ZAK) has been implicated in the regulation of cell-cycle arrest by decreasing cyclin-E expression, and a KRAB-containing protein has been shown to be associated with ZAK, playing a role in this phenomenon [15]. The expression of the KRAB-containing protein AJ8, for instance, is developmentally regulated in embryonic tibiae and calvariae, suggesting a role in the maturation of bone cells, and the overexpression of AJ8 in osteoblastic cells represses known markers of osteoblast differentiation [16]. Some KRAB proteins also appear to be involved in the regulation of apoptosis. Myeloid cells transfected with the cDNA of the KRAB-containing protein ZK1 are more sensitive to cell death induced by ionizing radiation than non-transfected cells [17]. Together, these examples support a role for KRAB-containing proteins in the regulation of morphogenesis. Consequently, several laboratories, including mine, have been investigating the functional association of these proteins with pathophysiological processes. Although there has not been any definitive proof on the causal role of KRAB-containing proteins in human diseases, using gene-mapping techniques, some KRAB-containing proteins have been proposed to be candidate genes for developmental and neoplastic disorders, as well as for schizophrenia [18,19]. The lack of functional evidence at this point makes this association tenuous, however. A better understanding of the molecular mechanisms underlying the functions of KRAB-containing proteins will have important biological implications. Mechanism of function Studies by three laboratories have identified a 100 kDa core-pressor protein for KRAB domains, known as Kap1, Tif1β, or Krip1 [20-22]. Binding to a RING-B-box coiled-coil (RBCC) motif of Kap1 is an absolute requirement for KRAB-containing proteins to mediate transcriptional repression. These elegant studies [20-22] demonstrated that Kap1 binds to KRAB domains as an oligomer, functioning as a scaffold to recruit heterochromatin protein 1 isoforms (HP1α, HP1β, and HPlγ), histone deacetylases (HDACs), and Setdb1, a novel SET-domain protein that methylates lysine 9 of histone h3. Interestingly, HP1 proteins bind to Lys9-methylated histone h3 in order to condense chromatin [23-28]. Together, these findings have recently led to the proposal of the model shown in Figure 3[27]. The model predicts that KRAB-containing proteins bind to their corresponding DNA sequence, triggering the recruitment of Kap1; subsequently, Kap1 forms a scaffold containing HP1, Setdb1, and an HDAC, and silences gene expression by forming a facultative heterochromatin environment on a target promoter. This model would suggest a KRAB-mediated stepwise assembly of a powerful corepressor complex. Further examination is needed, however, of whether the complex is instead preformed and then recruited by a KRAB-domain on particular promoter. Also, as these proteins can all be regulated by post-transla-tional modifications, it is not clear whether the corepressor complexes predicted by the model always contain Kap1, HP1, and SETDB1. Despite these questions, the building of this model is one of the most significant steps forward in this field of research. Frontiers KRAB-containing proteins were discovered in 1991. Today, a significant amount of information is known on both the structural and the basic biochemical properties of these proteins. Many questions remain to be addressed, however, including why there are so many proteins in the family although they are found only in tetrapods; the origin and function of their clustered genomic organization; the distinct cellular functions of each member of the family; how the domains within the proteins cooperate to achieve a specific cellular function; and how the proteins are regulated by post-translational modification. We anticipate that future studies in this field will be exciting and illuminating.
                Bookmark

                Author and article information

                Journal
                BMC Evol Biol
                BMC Evolutionary Biology
                BioMed Central
                1471-2148
                2008
                18 June 2008
                : 8
                : 176
                Affiliations
                [1 ]Department of Biochemistry, Université de Montreal, C.P. 6128, Succ. Centre-Ville, Montreal, QC, H3C 3J7, Canada
                Article
                1471-2148-8-176
                10.1186/1471-2148-8-176
                2443715
                18559114
                879041b4-3c52-445a-8240-e2a0683c8e44
                Copyright © 2008 Tadepally et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 17 March 2008
                : 18 June 2008
                Categories
                Research Article

                Evolutionary Biology
                Evolutionary Biology

                Comments

                Comment on this article