0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      H3K4me3, H3K9ac, H3K27ac, H3K27me3 and H3K9me3 Histone Tags Suggest Distinct Regulatory Evolution of Open and Condensed Chromatin Landmarks

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background: Transposons are selfish genetic elements that self-reproduce in host DNA. They were active during evolutionary history and now occupy almost half of mammalian genomes. Close insertions of transposons reshaped structure and regulation of many genes considerably. Co-evolution of transposons and host DNA frequently results in the formation of new regulatory regions. Previously we published a concept that the proportion of functional features held by transposons positively correlates with the rate of regulatory evolution of the respective genes. Methods: We ranked human genes and molecular pathways according to their regulatory evolution rates based on high throughput genome-wide data on five histone modifications (H3K4me3, H3K9ac, H3K27ac, H3K27me3, H3K9me3) linked with transposons for five human cell lines. Results: Based on the total of approximately 1.5 million histone tags, we ranked regulatory evolution rates for 25075 human genes and 3121 molecular pathways and identified groups of molecular processes that showed signs of either fast or slow regulatory evolution. However, histone tags showed different regulatory patterns and formed two distinct clusters: promoter/active chromatin tags (H3K4me3, H3K9ac, H3K27ac) vs. heterochromatin tags (H3K27me3, H3K9me3). Conclusion: In humans, transposon-linked histone marks evolved in a coordinated way depending on their functional roles.

          Related collections

          Most cited references 33

          • Record: found
          • Abstract: not found
          • Article: not found

          Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Histone acetyltransferases.

            Transcriptional regulation in eukaryotes occurs within a chromatin setting and is strongly influenced by nucleosomal barriers imposed by histone proteins. Among the well-known covalent modifications of histones, the reversible acetylation of internal lysine residues in histone amino-terminal domains has long been positively linked to transcriptional activation. Recent biochemical and genetic studies have identified several large, multisubunit enzyme complexes responsible for bringing about the targeted acetylation of histones and other factors. This review discusses our current understanding of histone acetyltransferases (HATs) or acetyltransferases (ATs): their discovery, substrate specificity, catalytic mechanism, regulation, and functional links to transcription, as well as to other chromatin-modifying activities. Recent studies underscore unexpected connections to both cellular regulatory processes underlying normal development and differentiation, as well as abnormal processes that lead to oncogenesis. Although the functions of HATs and the mechanisms by which they are regulated are only beginning to be understood, these fundamental processes are likely to have far-reaching implications for human biology and disease.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective

              Background Transposable element sequences are abundant yet poorly understood components of almost all eukaryotic genomes [1]. As a result, many biologists have an interest in the description of transposable elements in completely sequenced eukaryotic genomes. The evolutionary biologist wants to understand the origin of transposable elements, how they are lost and gained by a species and the role they play in the processes of genome evolution; the population geneticist wants to know the factors that determine the frequency and distribution of elements within and between populations; the developmental geneticist wants to know what roles these elements may play in either normal developmental processes or in the response of the organism to external conditions; finally, the molecular geneticist wants to know the mechanisms that regulate the transposition cycle of these elements and how they interact with the cellular machinery of the host. It is for all of these reasons and more that a description of the transposable elements in the recently completed Release 3 genomic sequence of D. melanogaster is desirable. Our understanding of transposable elements owes much to research on Drosophila. Over 75 years ago, Milislav Demerec discovered highly mutable alleles of two genes in D. virilis, miniature and magenta ([2,3,4], reviewed in [5]). Both genes were mutable in soma and germline and, for the miniature-3α alleles, dominant enhancers of mutability were also isolated by Demerec. In retrospect, it seems clear that the mutability of these alleles was the result of transposition of mobile elements. The dominant enhancers may have been particularly active elements or mutations in host genes that affect transposability (see below). There matters stood until McClintock's analysis of the Ac and Ds factors in maize, which led to the discovery of transposition [6] and the discovery of insertion elements in the gal operon of Escherichia coli (see [7]). Green [8] synthesized the available evidence to make a strong case for insertion as a mechanism of mutagenesis in Drosophila. Concurrently, Hogness' group had begun a molecular characterization of two elements in D. melanogaster, 412 and copia [9,10] and provided evidence that they were transposable [11,12,13]. Glover [14] unknowingly characterized the first eukaryotic transposable element at the molecular level, the insertion sequences of 28 S rRNA genes. The discovery of male recombination [15], and two systems of hybrid dysgenesis in D. melanogaster (see [16,17]) bridged the gap between genetic and molecular analyses. The discovery of the transposable elements that cause hybrid dysgenesis, the P element [18] and the I element [19], led to the first genomic analyses of transposable elements in a eukaryote. The publication of the Release 1 genomic sequence in March 2000 [20] and the Release 2 genomic sequence in October 2000 encouraged several studies on the genomic distribution and abundance of transposable elements in D. melanogaster [21,22,23,24,25]. Unfortunately, neither release was suitable for rigorous analysis of its transposable elements. In the whole-genome shotgun assembly process, repetitive sequences (including transposable elements) were masked by the SCREENER algorithm and remained as gaps between unitigs [26]. During the repeat-resolution phase of the whole-genome assembly, an attempt was made to fill these gaps. However, comparisons of small regions sequenced by the clone-by-clone approach versus the whole-genome shotgun method show that this process did not produce accurate sequences for transposable elements [26,27]. These results demonstrate that rigorous analyses of the transposable elements, or any other repetitive sequence, requires a sequence of higher quality, now publicly available as Release 3 [28]. For the first time, the nature, number and location of the transposable elements can reliably be analyzed in the euchromatin of D. melanogaster. Results and discussion Identification of known and novel transposable elements Eukaryotic transposable elements are divided into those that transpose via an RNA intermediate, the retrotransposons (class I elements), and those that transpose by DNA excision and repair, the transposons (class II elements [1]). Within the retrotransposons, the major division is between those that possess long terminal repeats (LTR elements) and those that do not (LINE and SINE elements [29]). Among the transposons, the majority transpose via a DNA intermediate, encode their own transposase and are flanked by relatively short terminally inverted repeat structures (TIR elements). Foldback (FB) elements, which are characterized by their property of reannealing after denaturation with zero-order kinetics, are quite distinct from prototypical class I or II elements, and have been included in our analyses [30]. Other classes of repetitive elements, such as DINE-1 [31,32,33], which are structurally distinct from all other classes, have not been included in this study. We used a criterion of greater than 90% identity over more than 50 base-pairs (bp) of sequence to assign individual elements to families (see Materials and methods for details; a classification is shown in the additional data available (see Additional data files)). Subsequently, in order to ensure proper inclusion of elements in appropriate families, we generated multiple alignments for all families of transposable element represented by multiple copies. This allowed us to identify and remove spurious hits to highly repetitive regions of the genome, and it also enabled us to distinguish sequences of closely related families that share extensive regions of similarity. A summary by class of the total number of complete and partial transposable elements in the Release 3 Drosophila euchromatic sequence is presented in Table 1, and detailed results for individual families of transposable element are listed in Table 2. Including those described here, there are 96 known families of transposable elements in D. melanogaster: 49 LTR families, 27 LINE-like families, 19 TIR families and the FB family. We have identified 1,572 full or partial elements from 93 of these 96 families (Table 1). In total, 3.86% (4.5 Mb) of the Release 3 sequence is composed of transposable elements. Previous analysis of both the euchromatic and heterochromatic sequences has suggested that 9% of the Drosophila genome is composed of repetitive elements [34]. One reason for this difference may be that the proportion of transposable element sequences in heterochromatic regions is higher than the genomic average [22,35]. As shown in Table 1 and Figure 1, the different classes vary in their contribution to the Drosophila euchromatin both in amount of sequence and number of elements. LTR elements make up the largest proportion of the euchromatin (2.65%), more sequence than the sum of all other classes of element (LINE-like elements 0.87%, TIR elements 0.31%, and FB elements 0.04%). LTR elements are also the most numerous class of transposable element in the euchromatic sequences (682) followed by LINE-like (486), TIR (372), and FB (32) elements. The largest family representing each of the three major classes is roo (146 copies; LTR), jockey (69 copies; LINE-like), and 1360 (105 copies; TIR) (Table 2). The average size of all transposable elements in our study is 2.9 kilobases (kb), smaller than the 5.6 kb average length of middle repetitive DNA, estimated from reassociation kinetics [36]. Three of the 96 families are not described in this paper because they have not been found in the euchromatic portion of this sequence; these are the P element, R2 and ZAM. It is not surprising that we did not find any P elements, as the sequenced strain was selected to be free of them. While we did not find R2 and ZAM elements in the euchromatin, both of these elements were identified in unmapped scaffolds that derive from the heterochromatin [37]. The R2 element has previously been found only within the 28 S rDNA locus and in heterochromatin [38]. Strains of D. melanogaster are known to exist in which ZAM elements occur in low copy number in heterochromatic sequences [39]. The absence of the telomere-associated HeT-A and TART from the euchromatic portions of all chromosomes except chromosome 4 is not unexpected; the tandem arrays of these two elements are flanked by Taq microsatellite sequences [40,41] which are difficult to assemble and are under-represented in the current version of this sequence. We discovered eight new families of transposable element within the Release 3 sequences. Two are members of the TIR class: Bari2 (EMBL: AF541951) and hopper2 (EMBL: AF541950). Six are members of the LTR class: frogger (EMBL: AF492763), rover (EMBL: AF492764), cruiser (a.k.a. Quasimodo) (EMBL: AF364550), McClintock (EMBL:AF541948), qbert (EMBL: AF541947), and Stalker4 (EMBL: AF541949). We identified Bari2 (four copies) by querying the D. melanogaster genome using a Bari1-like element isolated from D. erecta (EMBL: Y13853). The Bari2 element shares 52% amino acid identity with the Bari1 element; overall these elements share less than 50% nucleotide identity throughout their sequence. The hopper2 (five copies) and Stalker4 (two copies) families were identified by an analysis of the multiple alignment of the hopper and Stalker families, respectively. These alignments indicate distinct subfamilies on the basis of both nucleotide divergence and structural rearrangements over large regions of their alignment. The hopper2 and hopper elements share 70% amino-acid identity throughout their predicted open reading frames (ORFs). However, outside their ORFs, these elements are quite divergent and do not share significant nucleotide identity ( family_name,FBgn_id,FBti_id,chromosome_arm:Release 3_coordinates FBgn_id is the FlyBase record for the family, FBti_id is the unique identifier of each occurrence of an element and the coordinates are from the Release 3 data. In addition to the sequence of each element, each record includes 500 bp of 5' and 3' flanking sequence. These data will be regularly updated, in step with each new Release of the assembled sequence. Each release will be archived. File 4. The alignments of elements within a family used for the current analysis. This file is in MASE format [135] with each element identified by its FBti number. This is a frozen dataset that will not be updated by the BDGP. File 5. The nested transposable elements and element complexes are available as an independent dataset. Included within each sequence is 500 bp of flanking sequence on each side of the element complex. Each nest or complex has a unique FBti identifier number in FlyBase; in addition each component of a nest or complex has its own FBti identifier number. In the FASTA header line for each sequence in this file the data included are: >FBti_of_nest_or_complex,FBti_of_component,chromosome_arm: coordinates Comparison with other datasets To support our claim that the Release 1 sequence is an inadequate substrate for rigorous analysis we have compared the sequences of transposable elements in that release with those of Release 3. We determined the identity of elements in the two releases by a comparison of the 500 bp on their 5' flanks. Our results suggest that many, if not most, of the sequences from Release 1 are artifacts of that assembly. Of the 1,572 elements characterized in Release 3 only 381 (24%) were correctly determined in Release 1. Of the 1,191 (76%) sequences that were not correctly sequences in Release 1, 483 contained Ns, 45 were completely absent, and 663 contained an average of 34 incorrectly identified nucleotides per element. The complete data are available from [134]. Analytical methods Identification of known transposable elements WU-BLASTN 2.0 [136] was used to search all chromosome arms for regions of similarity to each element in the Release 3 dataset. The parameters for the BLAST search were M = 3, N = 3, Q = 3, R = 3, X = 3 and S = 3. BLAST searches were done on a 32-node dual PIII Linux-based compute farm supplied by Linux Network. Distribution of BLAST jobs to the cluster was managed by the Portable Batch System (PBS [137]). Individual BLAST jobs were submitted via pbsrsh, an rsh-like program (E.F., unpublished work). In addition, PBS was optimized and modified for the BDGP to handle a large number of queued jobs (E.F., unpublished work). BLAST reports were generated by searching a single chromosome arm with each individual element. The results were then parsed to generate a list of the coordinates of all high-scoring pairs (HSPs) that were at least 50 bp long and whose query and subject sequences had a pairwise identity of at least 90%. All HSPs on this list that were within 10 kb of each other and summed to greater than 100 bp were pooled into a 'span'. Each span was bounded by two coordinates - a start coordinate that corresponds to the lowest coordinate of any HSP in a particular span, and an end coordinate that corresponds to the highest coordinate of any HSP in the same span. A master list was then generated that contained all spans for all elements on a particular arm. Any spans (for the same or different elements) that had overlapping coordinates were examined further by an analysis of the sequences of the HSPs. While this identified a small number of spurious spans that did not correspond to real elements, the majority of these instances correspond to the nested elements discussed below. Start and end coordinates for all spans belonging to each element were used to extract genomic sequences for multiple sequence alignment (see below). In some rare instances where it was not possible to differentiate the element to which the HSP belonged, overlapping coordinates were recorded. Spurious sequences that did not align with other family members were removed from both the list of spans and the multiple alignments. Other attempts to define transposable element families on the basis of sequence identity have used a 90% cutoff with reference to the protein sequence of the reverse transcriptase motif of LTR-elements [25,138]. For LINE-like transposons, Berezikov et al. [24] used a 70% nucleic acid sequence identity criterion over 200 bp. Identification of new transposable elements through genome-genome comparison The first approach to discovering new transposable elements was by an all-by-all BLAST using chromosome arms 2L, 2R, 3R, 4 and the proximal half of the X. The chromosome arms were divided into 20-kb segments, each segment overlapping the previous by 10 kb. We used the NCBI-BLAST 2.0 to compare each 20-kb section against the others. Hits with greater than 95% identity and 1,000 bp long were parsed and used as query sequences in a BLAST against the canonical element sequence dataset. Redundant results were removed. The coordinates of the repeats were parsed and known repeats were tagged. New repeats were reviewed in CONSED [139] for the presence of ORFs and repeat structure. Identification of new transposable elements through isolation of LTR sequences A second approach was taken to identify single-copy elements containing LTRs. Each chromosome arm was divided into 1,000-bp pieces with neighboring pieces overlapping each other by 500 bp. WU-BLASTN 2.0 was used to search each chromosome arm for all regions of similarity to each 1000-bp piece (parameters: M = 3, N = 3, Q = 3, R = 3, X = 3 and S = 3). The BLAST report from such a search was parsed to generate a list of all HSPs that were at least 100 bp long and whose query and subject sequences had a pairwise identity of at least 95%. Then, all HSPs on this list greater than 500 bp apart and less than 15 kb apart were pooled into a span. As above, each span was bounded by a start coordinate which corresponds to the lowest coordinate of any HSP in a particular pool and an end coordinate which corresponds to the highest coordinate of any HSP in the same pool. Each set of coordinates was compared to the list of coordinates of transposable elements identified in the screen for known elements and these were eliminated from this list. Then, the coordinates of the remaining spans were used to extract genomic sequence from the finished chromosome arms. Each piece of genomic sequence was then compared to the coding sequence of the known transposable elements using WU-TBLASTX 2.0 (with default parameters). Any span that produced a hit with a E < 10-8 was analyzed by searching through the non-redundant protein database at the NCBI using NCBI-BLASTX [126]. Alignment and calculation of evolutionary distances Preliminary multiple alignments of elements within families were made using the default settings of DIALIGN v2-1 [140]. The resulting multiple alignments were visualized in the SEAVIEW alignment editor [135]. Subsequent realignment was done using the CLUSTALW (1.7.4) [141] implementation internal to SEAVIEW with manual refinement. Multiple alignments were used to calculate average pairwise distance within families using Kimura's 2-parameter substitution model (transition:transversion ratio = 2:1) [142] as implemented in the DNADIST program of the PHYLIP package [143]. Physical characteristics of element insertion sites To analyze the physical properties of the insertion sites of transposable elements we used the programs developed by Liao et al. [144]. The flanking sequences of elements with canonical ends were aligned, centered on a single copy of the element's target site sequence (that duplicated on insertion). The sequences were then analyzed for A-philicity, propeller twist, duplex stability and denaturation temperature, as described in [144]. As a baseline we used a randomly generated 500-bp sequence set of the same base composition as the overall genome of D. melanogaster (G. Liao, personal communication). These analyses were performed with 49 roo element sequences, 12 jockey sequences and 28 pogo sequences. Additional analyses were carried out using elements from the following families: copia, blood, 412 and Doc. Additional data files A table showing the classification of transposable elements in the genus Drosophila is available. Supplementary Material Additional data file 1 A table showing the classification of transposable elements in the genus Drosophila Click here for additional data file
                Bookmark

                Author and article information

                Journal
                Cells
                Cells
                cells
                Cells
                MDPI
                2073-4409
                05 September 2019
                September 2019
                : 8
                : 9
                Affiliations
                [1 ]Mathematical Biology & Bioinformatics Laboratory, Institute of Applied Mathematics and Mechanics, Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya 29, St. Petersburg 195251, Russia
                [2 ]Laboratory of Microbiological Monitoring and Bioremediation of Soil, All-Russia Research Institute for Agricultural Microbiology, Podbel’skogo, 3, St. Petersburg 196608, Russia
                [3 ]Lomonosov Moscow State University, Vorobiovy Gory 1, Moscow 119991, Russia
                [4 ]Research Centre for Medical Genetics, Moskvorechie Street 1, Moscow 115478, Russia
                [5 ]Omicsway Corp., Walnut, CA 91789, USA
                [6 ]Vavilov Institute of General Genetics Russian Academy of Sciences, Gubkina 3, Moscow 119991, Russia
                [7 ]Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
                [8 ]I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
                Author notes
                [* ]Correspondence: igolkinaanna11@ 123456gmail.com (A.A.I.); buzdin@ 123456oncobox.com (A.B.); Tel.: +7-9119136738 (A.A.I.)
                Article
                cells-08-01034
                10.3390/cells8091034
                6770625
                31491936
                © 2019 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                Categories
                Article

                Comments

                Comment on this article