15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      The biochemical basis of microRNA targeting efficacy

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          MicroRNAs (miRNAs) act within Argonaute proteins to guide repression of mRNA targets. Although various approaches have provided insight into target recognition, the sparsity of miRNA–target affinity measurements has limited understanding and prediction of targeting efficacy. Here, we adapted RNA bind-n-seq to enable measurement of relative binding affinities between Argonaute–miRNA complexes and all ≤12-nucleotide sequences. This approach revealed noncanonical target sites unique to each miRNA, miRNA-specific differences in canonical target-site affinities, and a 100-fold impact of dinucleotides flanking each site. These data enabled construction of a biochemical model of miRNA-mediated repression, which was extended to all miRNA sequences using a convolutional neural network. This model substantially improved prediction of cellular repression, thereby providing a biochemical basis for quantitatively integrating miRNAs into gene-regulatory networks.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: not found

          Principles of MicroRNA–Target Recognition

          Introduction MicroRNAs (miRNAs) are small non-coding RNAs that serve as post-transcriptional regulators of gene expression in plants and animals. They act by binding to complementary sites on target mRNAs to induce cleavage or repression of productive translation (reviewed in [1,2,3,4]). The importance of miRNAs for development is highlighted by the fact that they comprise approximately 1% of genes in animals, and are often highly conserved across a wide range of species (e.g., [5,6,7]). Further, mutations in proteins required for miRNA function or biogenesis impair animal development [8,9,10,11,12,13,14,15]. To date, functions have been assigned to only a few of the hundreds of animal miRNA genes. Mutant phenotypes in nematodes and flies led to the discovery that the lin-4 and let-7 miRNAs control developmental timing [16,17], that lsy-6 miRNA regulates left–right asymmetry in the nervous system [18], that bantam miRNA controls tissue growth [19], and that bantam and miR-14 control apoptosis [19,20]. Mouse miR-181 is preferentially expressed in bone marrow and was shown to be involved in hematopoietic differentiation [21]. Recently, mouse miR-375 was found to be a pancreatic-islet-specific miRNA that regulates insulin secretion [22]. Prediction of miRNA targets provides an alternative approach to assign biological functions. This has been very effective in plants, where miRNA and target mRNA are often nearly perfectly complementary [23,24,25]. In animals, functional duplexes can be more variable in structure: they contain only short complementary sequence stretches, interrupted by gaps and mismatches. To date, specific rules for functional miRNA–target pairing that capture all known functional targets have not been devised. This has created problems for search strategies, which apply different assumptions about how to best identify functional sites. As a result, the number of predicted targets varies considerably with only limited overlap in the top-ranking targets, indicating that these approaches might only capture subsets of real targets and/or may include a high number of background matches ([19,26,27,28,29,30]; reviewed by [31]). Nonetheless, a number of predicted targets have proven to be functional when subjected to experimental tests [19,26,27,29]. A better understanding of the pairing requirements between miRNA and target would clearly improve predictions of miRNA targets in animals. It is known that defined cis-regulatory elements in Drosophila 3′ UTRs are complementary to the 5′ ends of certain miRNAs [32]. The importance of the miRNA 5′ end has also emerged from the pairing characteristics and evolutionary conservation of known target sites [26], and from the observation of a non-random statistical signal specific to the 5′ end in genome-wide target predictions [27]. Tissue culture experiments have also underscored the importance of 5′ pairing and have provided some specific insights into the general structural requirements [29,33,34], though different studies have conflicted to some degree with each other, and with known target sites (reviewed in [31]). To date, no specific role has been ascribed to the 3′ end of miRNAs, despite the fact that miRNAs tend to be conserved over their full length. Here, we systematically evaluate the minimal requirements for a functional miRNA–target duplex in vivo. These experiments have allowed us to identify two broad categories of miRNA target sites. Targets in the first category, “5′ dominant” sites, base-pair well to the 5′ end of the miRNA. Although there is a continuum of 3′ pairing quality within this class, it is useful to distinguish two subtypes: “canonical” sites, which pair well at both the 5′ and 3′ ends, and “seed” sites, which require little or no 3′ pairing support. Targets in the second category, “3′ compensatory” sites, have weak 5′ base-pairing and depend on strong compensatory pairing to the 3′ end of the miRNA. We present evidence that all of these site types are used to mediate regulation by miRNAs and show that the 3′ compensatory class of target sites is used to discriminate among individual members of miRNA families in vivo. A genome-wide statistical analysis allows us to estimate that an average miRNA has approximately 100 evolutionarily conserved target sites, indicating that miRNAs regulate a large fraction of protein-coding genes. Evaluation of 3′ pairing quality suggests that seed sites are the largest group. Sites of this type have been largely overlooked in previous target prediction methods. Results The Minimal miRNA Target Site To improve our understanding of the minimal requirements for a functional miRNA target site, we made use of a simple in vivo assay in the Drosophila wing imaginal disc. We expressed a miRNA in a stripe of cells in the central region of the disc and assessed its ability to repress the expression of a ubiquitously transcribed enhanced green fluorescent protein (EGFP) transgene containing a single target site in its 3′ UTR. The degree of repression was evaluated by comparing EGFP levels in miRNA-expressing and adjacent non-expressing cells. Expression of the miRNA strongly reduced EGFP expression from transgenes containing a single functional target site (Figure 1A). In a first series of experiments we asked which part of the RNA duplex is most important for target regulation. A set of transgenic flies was prepared, each of which contained a different target site for miR-7 in the 3′ UTR of the EGFP reporter construct. The starting site resembled the strongest bantam miRNA site in its biological target hid [19] and conferred strong regulation when present in a single copy in the 3′ UTR of the reporter gene (Figure 1B). We tested the effects of introducing single nucleotide changes in the target site to produce mismatches at different positions in the duplex with the miRNA (note that the target site mismatches were the only variable in these experiments). The efficient repression mediated by the starting site was not affected by a mismatch at positions 1, 9, or 10, but any mismatch in positions 2 to 8 strongly reduced the magnitude of target regulation. Two simultaneous mismatches introduced into the 3′ region had only a small effect on target repression, increasing reporter activity from 10% to 30%. To exclude the possibility that these findings were specific for the tested miRNA sequence or duplex structure, we repeated the experiment with miR-278 and a different duplex structure. The results were similar, except that pairing of position 8 was not important for regulation in this case (Figure 1C). Moreover, some of the mismatches in positions 2–7 still allowed repression of EGFP expression up to 50%. Taken together, these observations support previous suggestions that extensive base-pairing to the 5′ end of the miRNA is important for target site function [26,27,29,32,34]. We next determined the minimal 5′ sequence complementarity necessary to confer target regulation. We refer to the core of 5′ sequence complementarity essential for target site recognition as the “seed” (Lewis et al. [27]). All possible 6mer, 5mer, and 4mer seeds complementary to the first eight nucleotides of the miRNA were tested in the context of a site that allowed strong base-pairing to the 3′ end of the miRNA (Figure 2A). The seed was separated from a region of complete 3′ end pairing by a constant central bulge. 5mer and 6mer seeds beginning at positions 1 or 2 were functional. Surprisingly, as few as four base-pairs in positions 2–5 conferred efficient target regulation under these conditions, whereas bases 1–4 were completely ineffective. 4mer, 5mer, or 6mer seeds beginning at position 3 were less effective. These results suggest that a functional seed requires a continuous helix of at least 4 or 5 nucleotides and that there is some position dependence to the pairing, since sites that produce comparable pairing energies differ in their ability to function. For example, the first two duplexes in Figure 2A (4mer, top row) have identical 5′ pairing energies (ΔG for the first 8 nt was −8.9 kcal/mol), but only one is functional. Similarly, the third 4mer duplex and fourth 5mer duplex (middle row) have the same energy (−8.7 kcal/mol), but only one is functional. We thus do not find a clear correlation between 5′ pairing energy and function, as reported in [34]. These experiments also indicate that extensive 3′ pairing of up to 17 nucleotides in the absence of the minimal 5′ element is not sufficient to confer regulation. Consequently, target searches based primarily on optimizing the extent of base-pairing or the total free energy of duplex formation will include many non-functional target sites [28,30,35], and ranking miRNA target sites according to overall complementarity or free energy of duplex formation might not reflect their biological activity [26,27,28,30,35]. To determine the minimal lengths of 5′ seed matches that are sufficient to confer regulation alone, we tested single sites that pair with eight, seven, or six consecutive bases to the miRNA's 5′ end, but that do not pair to its 3′ end (Figure 2B). Surprisingly, a single 8mer seed (miRNA positions 1–8) was sufficient to confer strong regulation by the miRNA. A single 7mer seed (positions 2–8) was also functional, although less effective. The magnitude of regulation for 8mer and 7mer seeds was strongly increased when two copies of the site were introduced in the UTR. In contrast, 6mer seeds showed no regulation, even when present in two copies. Comparable results were recently reported for two copies of an 8mer site with limited 3′ pairing capacity in a cell-based assay [34]. These results do not support a requirement for a central bulge, as suggested previously [29]. We took care in designing the miRNA 3′ ends to exclude any 3′ pairing to nearby sequence according to RNA secondary structure prediction. However, we cannot rule out the possibility that extensive looping of the UTR sequence might allow the 3′ end to pair to sequences further downstream in our reporter constructs. Note, however, that even if remote 3′ pairing was occurring and required for function of 8- and 7mer seeds, it is not sufficient for 5′ matches with less than seven complementary bases (all test sites are in the same sequence context; Figure 2B). In addition, pairing at a random level will occur in any sequence if long enough loops are allowed. However, whether the ribonucleoprotein complexes involved in translational repression require 3′ pairing, and whether they are able to allow extensive looping to achieve this, remains an open question. Computationally, remote 3′ pairing cannot be distinguished from random matches if loops of any length are allowed. On this basis any site with a 7- or 8mer seed has to be taken seriously—especially when evolutionarily conserved. From these experiments we conclude that (1) complementarity of seven or more bases to the 5′ end miRNA is sufficient to confer regulation, even if the target 3′ UTR contains only a single site; (2) sites with weaker 5′ complementarity require compensatory pairing to the 3′ end of the miRNA in order to confer regulation; and (3) extensive pairing to the 3′ end of the miRNA is not sufficient to confer regulation on its own without a minimal element of 5′ complementarity. The Effect of G:U Base-Pairs and Bulges in the Seed Several confirmed miRNA target genes contain predicted binding sites with seeds that are interrupted by G:U base-pairs or single nucleotide bulges [17,19,26,36,37,38,39]. In most cases these mRNAs contain multiple predicted target sites and the contributions of individual sites have not been tested. In vitro tests have shown that sites containing G:U base-pairs can function [29,34], but that G:U base-pairs contribute less to target site function than would be expected from their contribution to the predicted base-pairing energy [34]. We tested the ability of single sites with seeds containing G:U base-pairs and bulges to function in vivo. One, two, or three G:U base-pairs were introduced into single target sites with 8mer, 7mer, or 6mer seeds (Figure 3A). A single G:U base-pair caused a clear reduction in the efficiency of regulation by an 8mer seed site and by a 7mer seed site. The site with a 6mer seed lost its activity almost completely. Having more than one G:U base-pair compromised the activity of all the sites. As the target sites were designed to allow optimal 3′ pairing, we conclude that G:U base-pairs in the seed region are always detrimental. Single nucleotide bulges in the seed are found in the let-7 target lin-41 and in the lin-4 target lin-14 [17,36,37]. Recent tissue culture experiments have led to the proposal that such bulges are tolerated if positioned symmetrically in the seed region [29]. We tested a series of sites with single nucleotide bulges in the target or the miRNA (Figure 3B). Only some of these sites conferred good regulation of the reporter gene. Our results do not support the idea that such sites depend on a symmetrical arrangement of base-pairs flanking the bulge. We also note that the identity of the bulged nucleotide seems to matter. While it is clear that some target sites with one nucleotide bulge or a single mismatch can be functional if supported by extensive complementarity to the miRNA 3′ end, it is not possible to generalize about their potential function. Functional Categories of Target Sites While recognizing that there is a continuum of base-pairing quality between miRNAs and target sites, the experiments presented above suggest that sites that depend critically on pairing to the miRNA 5′ end (5′ dominant sites) can be distinguished from those that cannot function without strong pairing to the miRNA 3′ end (3′ compensatory sites). The 3′ compensatory group includes seed matches of four to six base-pairs and seeds of seven or eight bases that contain G:U base-pairs, single nucleotide bulges, or mismatches. We consider it useful to distinguish two subgroups of 5′ dominant sites: those with good pairing to both 5′ and 3′ ends of the miRNA (canonical sites) and those with good 5′ pairing but with little or no 3′ pairing (seed sites). We consider seed sites to be those where there is no evidence for pairing of the miRNA 3′ end to nearby sequences that is better than would be expected at random. We cannot exclude the possibility that some sites that we identify as seed sites might be supported by additional long-range 3′ pairing. Computationally, this is always possible if long enough loops in the UTR sequence are allowed. Whether long loops are functional in vivo remains to be determined. Canonical sites have strong seed matches supported by strong base-pairing to the 3′ end of the miRNA. Canonical sites can thus be seen as an extension of the seed type (with enhanced 3′ pairing in addition to a sufficient 5′ seed) or as an extension of the 3′compensatory type (with improved 5′ seed quality in addition to sufficient 3′ pairing). Individually, canonical sites are likely to be more effective than other site types because of their higher pairing energy, and may function in one copy. Due to their lower pairing energies, seed sites are expected to be more effective when present in more than one copy. Figure 4 presents examples of the different site types in biologically relevant miRNA targets and illustrates their evolutionary conservation in multiple drosophilid genomes. Most currently identified miRNA target sites are canonical. For example, the hairy 3′ UTR contains a single site for miR-7, with a 9mer seed and a stretch of 3′ complementarity. This site has been shown to be functional in vivo [26], and it is strikingly conserved in the seed match and in the extent of complementarity to the 3′ end of miR-7 in all six orthologous 3′ UTRs. Although seed sites have not been previously identified as functional miRNA target sites, there is some evidence that they exist in vivo. For example, the Bearded (Brd) 3′ UTR contains three sequence elements, known as Brd boxes, that are complementary to the 5′ region of miR-4 and miR-79 [32,40]. Brd boxes have been shown to repress expression of a reporter gene in vivo, presumably via miRNAs, as expression of a Brd 3′ UTR reporter is elevated in dicer-1 mutant cells, which are unable to produce any miRNAs [14]. All three Brd box target sites consist of 7mer seeds with little or no base-pairing to the 3′ end of either miR-4 or miR-79 (see below). The alignment of Brd 3′ UTRs shows that there is little conservation in the miR-4 or miR-79 target sites outside the seed sequence, nor is there conservation of pairing to either miRNA 3′ end. This suggests that the sequences that could pair to the 3′ end of the miRNAs are not important for regulation as they do not appear to be under selective pressure. This makes it unlikely that a yet unidentified Brd box miRNA could form a canonical site complex. The 3′ UTR of the HOX gene Sex combs reduced (Scr) provides a good example of a 3′ compensatory site. Scr contains a single site for miR-10 with a 5mer seed and a continuous 11-base-pair complementarity to the miRNA 3′ end [28]. The miR-10 transcript is encoded within the same HOX cluster downstream of Scr, a situation that resembles the relationship between miR-iab-5p and Ultrabithorax in flies [26] and miR-196/HoxB8 in mice [41]. The predicted pairing between miR-10 and Scr is perfectly conserved in all six drosophilid genomes, with the only sequence differences occurring in the unpaired loop region. The site is also conserved in the 3′ UTR of the Scr genes in the mosquito, Anopheles gambiae, the flour beetle, Tribolium castaneum, and the silk moth,Bombyx mori. Conservation of such a high degree of 3′ complementarity over hundreds of millions of years of evolution suggests that this is likely to be a functional miR-10 target site. Extensive 5′ and 3′ sequence conservation is also seen for other 3′ compensatory sites, e.g., the two let-7 sites in lin-41 or the miR-2 sites in grim and sickle [17,26,36]. The miRNA 3′ End Determines Target Specificity within miRNA Families Several families of miRNAs have been identified whose members have common 5′ sequences but differ in their 3′ ends. In view of the evidence that 5′ ends of miRNA are functionally important [26,27,29,42], and in some cases sufficient (present study), it can be expected that members of miRNA families may have redundant or partially redundant functions. According to our model, 5′ dominant canonical and seed sites should respond to all members of a given miRNA family, whereas 3′ compensatory sites should differ in their sensitivity to different miRNA family members depending on the degree of 3′ complementarity. We tested this using the wing disc assay with 3′ UTR reporter transgenes and overexpression constructs for various miRNA family members. miR-4 and miR-79 share a common 5′ sequence that is complementary to a single 8mer seed site in the bagpipe 3′ UTR (Figure 5A and 5B). The 3′ ends of the miRNAs differ. miR-4 is predicted to have 3′ pairing at approximately 50% of the maximally possible level (−10.8 kcal/mol), whereas the level of 3′ pairing for miR-79 is approximately 25% maximum (−6.1 kcal/mol), which is below the average level expected for random matches (see below). Both miRNAs repressed expression of the bagpipe 3′ UTR reporter, regardless of the 3′ complementarity (Figure 5B). This indicates that both types of site are functional in vivo and suggests that bagpipe is a target for both miRNAs in this family. To test whether miRNA family members can also have non-overlapping targets, we used 3′ UTR reporters of the pro-apoptotic genes grim and sickle, two recently identified miRNA targets [26]. Both genes contain K boxes in their 3′ UTRs that are complementary to the 5′ ends of the miR-2, miR-6, and miR-11 miRNA family [26,32]. These miRNAs share residues 2–8 but differ considerably in their 3′ regions (Figure 5A). The site in the grim 3′ UTR is predicted to form a 6mer seed match with all three miRNAs (Figure 5C, left), but only miR-2 shows the extensive 3′ complementarity that we predict would be needed for a 3′ compensatory site with a 6mer seed to function (−19.1 kcal/mol, 63% maximum 3′ pairing, versus −10.9 kcal/mol, 46% maximum, for miR-11 and −8.7 kcal/mol, 37% maximum, for miR-6). Indeed, only miR-2 was able to regulate the grim 3′ UTR reporter, whereas miR-6 and miR-11 were non-functional. The sickle 3′ UTR contains two K boxes and provides an opportunity to test whether weak sites can function synergistically. The first site is similar to the grim 3′ UTR in that it contains a 6mer seed for all three miRNAs but extensive 3′ complementarity only to miR-2. The second site contains a 7mer seed for miR-2 and miR-6 but only a 6mer seed for miR-11 (Figure 5C, right). miR-2 strongly downregulated the sickle reporter, miR-6 had moderate activity (presumably via the 7mer seed site), and miR-11 had nearly no activity, even though the miRNAs were overexpressed. The fact that a site is targeted by at least one miRNA argues that it is accessible (e.g., miR-2 is able to regulate both UTR reporters), and that the absence of regulation for other family members is due to the duplex structure. These results are in line with what we would expect based on the predicted functionality of the individual sites, and indicate that our model of target site functionality can be extended to UTRs with multiple sites. Weak sites that do not function alone also do not function when they are combined. To show that endogenous miRNA levels regulate all three 3′ UTR reporters, we compared EGFP expression in wild-type cells and dicer-1 mutant cells, which are unable to produce miRNAs [14]. dicer-1 clones did not affect a control reporter lacking miRNA binding sites, but showed elevated expression of a reporter containing the 3′ UTR of the previously identified bantam miRNA target hid (Figure 5D). Similarly, all 3′ UTR reporters above were upregulated in dicer-1 mutant cells, indicating that bagpipe, sickle, and grim are subject to repression by miRNAs expressed in the wing disc. Taken together, these experiments indicate that transcripts with 5′ dominant canonical and seed sites are likely to be regulated by all members of a miRNA family. However, transcripts with 3′ compensatory sites can discriminate between miRNA family members. Genome-Wide Occurrence of Target Sites Experimental tests such as those presented above and the observed evolutionary conservation suggest that all three types of target sites are likely to be used in vivo. To gain additional evidence we examined the occurrence of each site type in all Drosophila melanogaster 3′ UTRs. We made use of the D. pseudoobscura genome, the second assembled drosophilid genome, to determine the degree of site conservation for the three different site classes in an alignment of orthologous 3′ UTRs. From the 78 known Drosophila miRNAs, we selected a set of 49 miRNAs with non-redundant 5′ sequences. We first investigated whether sequences complementary to the miRNA 5′ ends were better conserved than would be expected for random sequences. For each miRNA, we constructed a cohort of ten randomly shuffled variants. To avoid a bias for the number of possible target matches, the shuffled variants were required to produce a number of sequence matches comparable (±15%) to the original miRNAs for D. melanogaster 3′ UTRs. 7mer and 8mer seeds complementary to real miRNA 5′ ends were significantly better conserved than those complementary to the shuffled variants. This is consistent with the findings of Lewis et al. [27] but was obtained without the need to use a rank and energy cutoff applied to the full-length miRNA target duplex, as was the case for vertebrate miRNAs. Conserved 8mer seeds for real miRNAs occur on average 2.8 times as often as seeds complementary to the shuffled miRNAs (Figure 6A). For 7mer seeds this signal was 2:1, whereas 6mer, 5mer, and 4mer seeds did not show better conservation than expected for random sequences. To assess the validity of these signals and to control for the random shuffling of miRNAs, we repeated this procedure with “mutant” miRNAs in which two residues in the 5′ region were changed. There was no difference between the mutant test miRNAs and their shuffled variants (Figure 6A). This indicates that a substantial fraction of the conserved 7mer and 8mer seeds complementary to real miRNAs identify biologically relevant target sites. 3′ compensatory and canonical sites depend on substantial pairing to the miRNA 3′ end. For these sites, we expect UTR sequences adjacent to miRNA 5′ seed matches to pair better to the miRNA 3′ end than to random sequences. However, unlike 5′ complementarity, 3′ base-pairing preference was not detected in previous studies looking at sequence complementarity and nucleotide conservation because UTR sequences complementary to the miRNA 3′ end were not better conserved than would be expected at random [27]. On this basis, we decided to treat the 5′ and 3′ ends of the miRNA separately. For the 5′ end, seed matches were required to be fully conserved in an alignment of orthologous D. melanogaster and D. pseudoobscura 3′ UTRs (we expected one-half to two-thirds of these matches to be real miRNA sites). We first investigated the overall conservation of UTR sequences adjacent to the conserved seed matches and found that overall the sequences are not better conserved than a random control with shuffled miRNAs (Figure 6B). For both real and random matches, the number of sites increases with the degree of 3′ conservation (up to the 80% level), reflecting the increased probability that sequences adjacent to conserved seed matches will also lie in blocks of conserved sequence (Figure 6B). For real 7mers and 8mers we found a slightly higher percentage of sites between 30% and 80% identity than we did for the shuffled controls. In contrast, the ratio of sites with over 80% sequence identity was smaller for real 7- or 8mers than for random ones, meaning that in highly conserved 3′ UTR blocks (>80% identity) the ratio of random matches exceeds that of real miRNA target sites. This caused us to question whether the degree of conservation for sequences adjacent to seed matches correlates with miRNA 3′ pairing as would be expected if the conservation were due to a biologically relevant miRNA target site. Indeed, we found that the best conserved sites adjacent to seed matches (i.e., those with zero, one, or two mismatches in the 3′ UTR alignment) and the least conserved sites (i.e., those with only three, two, or one matching nucleotides) are not distinguishable in that both pair only randomly to the corresponding miRNA 3′ end (approximately 35% maximal 3′ pairing energy, data not shown). The observation that miRNA target sites do not seem to be fully conserved over their entire length is consistent with the examples shown in Figure 4 in which only the degree of 3′ pairing but not the nucleotide identity is conserved (miR-7/hairy), or at least the unpaired bulge is apparently not under evolutionary pressure (miR-10/Scr). Although this result obviously depends on the evolutionary distance of the species under consideration (see [43] for a comparison of mammalian sites), it shows that conclusions about the contribution of miRNA 3′ pairing to target site function cannot be drawn solely from the degree of sequence conservation. We therefore chose to evaluate the quality of 3′ pairing by the stability of the predicted RNA–RNA duplex. We assessed predicted pairing energy between the miRNA 3′ end and the adjacent UTR sequence for both Drosophila species and used the lower score. Use of the lower score measures conservation of the overall degree of pairing without requiring sequence identity. Figure 6C shows the distribution of the 3′ pairing energies for all conserved 3′ compensatory miR-7 sites identified by a 6mer seed match, compared to the distribution of 50 miR-7 sequences shuffled only in the 3′ part, leaving the 5′ unchanged. This means that real and shuffled miRNAs identify the same 5′ seed matches in the 3′ UTRs, which allows us to compare the 3′ pairing characteristics of the adjacent sequences. We also required 3′ shuffled sequences to have similar pairing energies (±15%) to their complementary sequences and to 10,000 randomly selected sites to exclude generally altered pairing characteristics. The distributions for real and shuffled miRNAs were highly similar, with a mean of approximately 35% of maximal 3′ pairing energy and few sites above 55%. However, a small number of sites paired exceptionally well to miR-7 at energies that were far above the shuffled averages and not reached by any of the 50 shuffled controls. This example illustrates that there is a significant difference between real and shuffled miRNAs for the sites with the highest 3′ complementarity, which are likely to be biologically relevant. Sites with weaker 3′ pairing might also be functional, but cannot be distinguished from random matches and can only be validated by experiments (see Figure 5). To provide a global analysis of 3′ pairing comprising all miRNAs and to investigate how many miRNAs show significantly non-random 3′ pairing, we considered only the sites within the highest 1% of 3′ pairing energies. The average of the highest 1% of 3′ pairing energies of each of 58 3′ non-redundant miRNAs was divided by that of its 50 3′ shuffled controls. This ratio is one if the averages are the same, and increases if the real miRNA has better 3′ pairing than the shuffled miRNAs. To test whether a signal was specific for real miRNAs, we repeated the same protocol with a mutant version of each miRNA. The altered 5′ sequence in the mutant miRNA selects different seed matches than the real miRNA and permits a comparison of sequences that have not been under selection for complementarity to miRNA 3′ ends with those that may have been. Figure 6D shows the distribution of the energy ratios for canonical (left) and 3′ compensatory sites (right) for all 58 real and mutated 3′ non-redundant miRNAs. Most real miRNAs had ratios close to one, comparable to the mutants. But several had ratios well above those observed for mutant miRNAs, indicating significant conserved 3′ pairing. A small fraction of sites show exceptionally good 3′ pairing. If we use 3′ pairing energy cutoffs to examine site quality for all miRNAs, we expect sites of this type to be distinguishable from random matches. The ratio of the number of sites above the cutoff for real versus 3′ shuffled miRNAs was plotted as a function of the 3′ pairing cutoff (Figure 6E). For low cutoffs the ratio is one, as the number of sites corresponds to the number of seed matches (which is identical for real and 3′ shuffled miRNAs). For increasing cutoffs, the ratios increase once a certain threshold is reached, reflecting overrepresentation of sites that pair favorably to the real miRNA 3′ end but not the 3′ shuffled miRNAs. The maximal ratio obtained for mutated miRNAs never exceeded five, which we used as the threshold level to define where significant overrepresentation begins. For 8mer seed sites overrepresentation began at 55% maximal 3′ pairing; for 7mer seed sites, at 65%; for 6mer seed sites, at 68%; and for 5mer seed sites, at 78%. There was no statistical evidence for sites with 4mer seeds. We also tested whether sequences forming 7mer or 8mer seeds containing G:U base-pairs, mismatches, or bulges were better conserved if complementary to real miRNAs. We did not find any statistical evidence for these seed types. Analysis of 3′ pairing also failed to show any non-random signal for these sites. This suggests that such sites are few in number genome-wide and are not readily distinguished from random matches. Nonetheless, our experiments do show that sites of this type can function in vivo. The let-7 sites in lin-41 provide a natural example. Most Sites Lack Substantial 3′ Pairing The experimental and computational results presented above provide information about 5′ and 3′ pairing that allows us to estimate the number of target sites of each type in Drosophila. The number of 3′ compensatory sites cannot be estimated on the basis of 5′ pairing, because seed matches of four, five, or six bases cannot be distinguished from random matches, reflecting that a large number of randomly conserved and non-functional matches predominate (Figure 6A). Significant 3′ pairing can be distinguished from random matches for 6mer sites above 68% maximal 3′ pairing energy, and above 78% for 5mers (Figure 6E). Using these pairing levels gives an estimate of one 3′ compensatory site on average per miRNA. The experiments in Figure 5 provide an opportunity to assess the contribution of 3′ pairing to the ability of sites with 6mer seeds to function. The 6mer K box site in the grim 3′ UTR was regulated by miR-2 (63% maximal 3′ pairing energy), but not by miR-11, which has a predicted 3′ pairing energy of 46%. Similarly, the 6mer seed sites for miR-11 in the sickle 3′ UTR had 3′ pairing energies of approximately 35% and were non-functional. We can use the 63% and 46% levels to provide upper and lower estimates of one and 20 3′ compensatory 6mer sites on average per miRNA. For 5mer sites, the examples in Figure 1 show that sites with 76% and 83% maximal 3′ pairing do not function. At the 80% threshold level, we expect less than one additional site on average per miRNA, suggesting that 3′ compensatory sites with 5mer seeds are rare. The predicted miR-10 site in Scr (see Figure 4) is one of the few sites with a 5mer seed that reaches this threshold (100% maximum 3′ pairing energy; −20 kcal/mol). It is likely that other sites in this group will also prove to be functionally important. The overrepresentation of conserved 5′ seed matches (see Figure 6A) suggests that approximately two-thirds of sites with 8mer seeds and approximately one-half of the sites with 7mer seeds are biologically relevant. This corresponds to an average of 28 8mers and 53 7mers, for a total of 81 sites per miRNA. We define canonical sites as those with meaningful contributions from both 5′ and 3′ pairing. Given that 7- and 8mer seed matches can function without significant 3′ pairing, it is difficult to assess at what level 3′ pairing contributes meaningfully to their function. The range of 3′ pairing energies that were minimally sufficient to support a weak seed match was between 46% and 63% of maximum pairing energy (see Figure 5C). If we take the 46% level as the lower limit for meaningful 3′ pairing, over 95% of sites would be considered seed sites. This changes to 99% for pairing energies that can be statistically distinguished from noise (55% maximal; see Figure 6E) and remains over 50% even for pairing energies at the average level achieved by random matches (30% maximal). It is clear from this analysis that the majority of miRNA target sites lack substantial pairing in the 3′ end in nearby sequences. Indeed the 3′ pairing level for the three seed sites for miR-4 in Brd are all less than 25% (i.e., below the average for random matches) and Brd was thus not predicted as a miR-4 target previously [26,28,35]. Again, we note the caveat that some of sites that we identify as seed could in principle be supported by 3′ pairing to more distant upstream sequences, but also that such sites would be difficult to distinguish from background computationally and that it is unclear whether large loops are functional. If there were statistical evidence for 3′ pairing that is lower than would be expected at random for some sites, this would be one line of argument for a discrete functional class that does not use 3′ pairing and would therefore suggest selection against 3′ pairing. Although the overall distribution of 3′ pairing energies for real miRNA 3′ ends adjacent to 8mer seed matches is very similar to the random control with 3′ shuffled sequences (Figure 7; R 2 = 0.98), we observed a small but significant overrepresentation of real sites on both sides of the random distribution, which leads to a slightly wider distribution of real sites at the expense of the peak values around 30% pairing. Bearing in mind that one-third of 8mer seed matches are false positives (see Figure 6A), we can account for the noise by subtracting one-third of the random distribution. We then see two peaks at around 20% and 35% maximum pairing energy, separated by a dip. Subtracting more (e.g., one-half or two-thirds) of the random distribution increases the separation of the two peaks, suggesting that the underlying distribution of 3′ pairing for real 8mer seed sites might indeed be bimodal. This effect is still present, though less pronounced, if 7mer seed matches are included. No such effect is seen for the combined 5- and 6mer seed matches. In addition, we see no difference between a random (noise) model that evaluates 3′ pairing of 3′ shuffled miRNAs to UTR sites identified by real miRNA seed matches and a random model that pairs the real (i.e., non-shuffled) miRNA 3′ end to randomly chosen UTR sequences, thus excluding bias due to shuffling. Overall, these results suggest that there might indeed be a bimodal distribution due to an enrichment of sites with both better and worse 3′ pairing than would be expected at random. We take this as evidence that seed sites are a biologically meaningful subgroup within the 5′ dominant site category. Overall, these estimates suggest that there are over 80 5′ dominant sites and 20 or fewer 3′ compensatory sites per miRNA in the Drosophila genome. As estimates of the number of miRNAs in Drosophila range from 96 to 124 [44], this translates to 8,000–12,000 miRNA target sites genome-wide, which is close to the number of protein-coding genes. Even allowing for the fact that some genes have multiple miRNA target sites, these findings suggest that a large fraction of genes are regulated by miRNAs. Discussion We have provided experimental and computational evidence for different types of miRNA target sites. One key finding is that sites with as little as seven base-pairs of complementarity to the miRNA 5′ end are sufficient to confer regulation in vivo and are used in biologically relevant targets. Genome-wide, 5′ dominant sites occur 2- to 3-fold more often in conserved 3′ UTR sequences than would be expected at random. The majority of these sites have been overlooked by previous miRNA target prediction methods because their limited capacity to base-pair to the miRNA 3′ end cannot be distinguished from random noise. Such sites rank low in search methods designed to optimize overall pairing energy [16,17,26,27,28,30,35]. Indeed, we find that few seed sites scored high enough to be considered seriously in these earlier predictions, even when 5′ complementarity was given an additional weighting (e.g., [28,43]. We thus suspect that methods with pairing cutoffs would exclude many, if not all, such sites. In a scenario in which protein-coding genes acquire miRNA target sites in the course of evolution [4], it is likely that seed sites with only seven or eight bases complementary to a miRNA would be the first functional sites to be acquired. Once present, a site would be retained if it conferred an advantage, and sites with extended complementarity could also be selected to confer stronger repression. In this scenario, the number of sites might grow over the course of evolution so that ancient miRNAs would tend to have more targets than those more recently evolved. Likewise, genes that should not be repressed by the miRNA milieu in a given cell type would tend to avoid seed matches to miRNA 5′ ends (“anti-targets” [4]). Although a 7- to 8mer seed is sufficient for a site to function, additional 3′ pairing increases miRNA functionality. The activity of a single 7mer canonical site is expected to be greater than an equivalent seed site. Likewise, the magnitude of miRNA-induced repression is reduced by introducing 3′ mismatches into a canonical site. Genome-wide, there are many sites that appear to show selection for conserved 3′ pairing and, interestingly, many sites that appear to show selection against 3′ pairing. In vivo, canonical sites might function at lower miRNA concentrations and might repress translation more effectively, particularly when multiple sites are present in one UTR (e.g., [42]). Efficient repression is likely to be necessary for genes whose expression would be detrimental, as illustrated by the genetically identified miRNAs, which produce clear mutant phenotypes when their targets are not normally repressed (“switch targets” [4]). Prolonged expression of the lin-14 and lin-41 genes in Caenorhabditis elegans mutant for lin-4 or let-7 causes developmental defects, and their regulation involves multiple sites [17,36,37]. Similarly, multiple target sites allow robust regulation of the pro-apoptotic gene hid by bantam miRNA in Drosophila [19]. More subtle modulation of expression levels could be accomplished by weaker sites, such as those lacking 3′ pairing. Sites that cannot function efficiently alone are in fact a prerequisite for combinatorial regulation by multiple miRNAs. Seed sites might thus be useful for situations in which the combined input of several miRNAs is used to regulate target expression. Depending on the nature of the target sites, any single miRNA might not have a strong effect on its own, while being required in the context of others. 3′ Complementarity Distinguishes miRNA Family Members 3′ compensatory sites have weak 5′ pairing and need substantial 3′ pairing to function. We find genome-wide statistical support for 3′ compensatory sites with 5mer and 6mer seeds and show that they are used in vivo. Furthermore, these sites can be differentially regulated by different miRNA family members depending on the quality of their 3′ pairing (e.g., regulation of the pro-apoptotic genes grim and sickle by miR-2, miR-6, and miR-11). Thus, members of a miRNA family may have common targets as well as distinct targets. They may be functionally redundant in regulation of some targets but not others, and so we can expect some overlapping phenotypes as well as differences in their mutant phenotypes. Following this reasoning, it is likely that the let-7 miRNA family members differentially regulate lin-41 in C. elegans [17,45]. The seed matches in lin-41 to let-7 and the related miRNAs miR-48, miR-84, and miR-241 are weak, and only let-7 has strong 3′ pairing. On this basis, it seems likely that lin-41 is regulated only by let-7. In contrast, hbl-1 has four sites with strong seed matches [38,39], and we expect it to be regulated by all four let-7 family members. As all four let-7-related miRNAs are expressed similarly during development [6], their role as regulators of hbl-1 may be redundant. let-7 must also have targets not shared by the other family members, as its function is essential. lin-41 is likely to be one such target. The idea that the 3′ end of miRNAs serves as a specificity factor provides an attractive explanation for the observation that many miRNAs are conserved over their full length across species separated by several hundreds of millions of years of evolution. 3′ compensatory sites may have evolved from canonical sites by mutations that reduce the quality of the seed match. This could confer an advantage by allowing a site to become differentially regulated by miRNA family members. In addition, sites could retain specificity and overall pairing energy, but with reduced activity, perhaps permitting discrimination between high and low levels of miRNA expression. This might also allow a target gene to acquire a dependence on inputs from multiple miRNAs. These scenarios illustrate a few ways in which more complex regulatory roles for miRNAs might arise during evolution. A Large Fraction of the Genome Is Regulated by miRNAs Another intriguing outcome of this study is evidence for a surprisingly large number of miRNA target sites genome-wide. Even our conservative estimate is far above the numbers of sites in recent predictions, e.g., seven or fewer per miRNA [27,28,29]. Our estimate of the total number of targets approaches the number of protein-coding genes, suggesting that regulation of gene expression by miRNAs plays a greater role in biology than previously anticipated. Indeed, Bartel and Chen [46] have suggested in a recent review that the earlier estimates were likely to be low, and a recent study by John et al. [43], published while this manuscript was under review, predicts that approximately 10% of human genes are regulated by miRNAs. We agree with these authors' suggestion that this is likely an underestimate, because their method identifies an average of only 7.1 target genes per miRNA, with few that we would classify as seed sites lacking substantial 3′ pairing. A large number of target sites per miRNA is also consistent with combinatorial gene regulation by miRNAs, analogous to that by transcription factors, leading to cell-type-specific gene expression [47]. Sites for multiple miRNAs allow for the possibility of cell-type-specific miRNA combinations to confer robust and specific gene regulation. Our results provide an improved understanding of some of the important parameters that define how miRNAs bind to their target genes. We anticipate that these will be of use in understanding known miRNA–target relationships and in improving methods to predict miRNA targets. We have limited our evaluation to target sites in 3′ UTRs. miRNAs directed at other types of targets or with dramatically different functions (e.g., in regulation of chromatin structure) might well use different rules. Accordingly, there may prove to be more targets than we can currently estimate. Further, there may be additional features, such as overall UTR context, that either enhance or limit the accessibility of predicted sites and hence their ability to function. For example, the rules about target site structure cannot explain the apparent requirement for the linker sequence observed in the let-7/lin-41 regulation [48]. Further efforts toward experimental target site validation and systematic examination of UTR features can be expected to provide new insight into the function of miRNA target sites. Materials and Methods Fly strains ptcGal4; EP miR278 was provided by Aurelio Teleman. The control, hid, grim, and sickle 3′ UTR reporter transgenes, and UAS-miR-2b are described in [19,26]. For UAS constructs for miRNA overexpression, genomic fragments including miR-4 (together with miR-286 and miR-5) and miR-11 were amplified by PCR and cloned into UAS-DSred as described for UAS-miR-7 [26]. Details are available on request. UAS-miR-79 (also contains miR-9b and miR-9c) and UAS-miR-6 (miR-6–1, miR-6–2, and miR-6–3) were kindly provided by Eric Lai. dcr-1 Q1147X is described in [14]. Clonal analysis Clones mutant for dcr-1 Q1147X were induced in HS-Flp;dcr-1 FRT82/armadillo-lacZ FRT82 larvae by heat shock for 1 h at 38 °C at 50–60 h of development. Wandering third-instar larvae were dissected and labeled with rabbit anti-GFP (Torrey Pines Biolabs, Houston, Texas, United States; 1:400) and anti-β-Gal (rat polyclonal, 1:500). Reporter constructs The bagpipe 3′ UTR was PCR amplified from genomic DNA (using the following primers [enzyme sites in lower case]: AAtctaga AGGTTGGGAGTGACCATGTCTC and AActcgag TATTTAGCTCTCGGGTAGATACG) and cloned downstream of the tubulin promoter and EGFP (Clontech, Palo Alto, California, United States) in Casper4 as in [26]. Single target site constructs Oligonucleotides containing the target site sequences shown in the figures were annealed and cloned downstream of tub>EGFP and upstream of SV40polyA (XbaI/XhoI). Clones were verified by DNA sequencing. Details are available on request. EGFP intensity measurements NIH image 1.63 was used to quantify intensity levels in miRNA-expressing and non-expressing cells from confocal images. Depending on the variation, between three and five individual discs were analyzed. 3′ UTR alignments For each D. melanogaster gene, we identified the D. pseudoobscura ortholog using TBlastn as described in [26]. We then aligned the D. melanogaster 3′ UTR obtained from the Berkeley Drosophila Genome Project to the D. pseudoobscura 3′ adjacent sequence (Human Genome Sequencing Center at Baylor College of Medicine) using AVID [49]. For individual examples, we manually mapped the D. melanogaster coding region to genomic sequence traces (National Center for Biotechnology Information trace archive) of D. ananassae, D. virilis, D. simulans, and D. yakuba by TBlastn and extended the sequences by Blastn-walking. These 3′ UTR sequences were then aligned to the D. melanogaster and D. pseudoobscura 3′ UTRs using AVID. miRNA-sequences Drosophila miRNA sequences were from [44,50,51] downloaded from Rfam (http://www.sanger.ac.uk/Software/Rfam/mirna/index.shtml). The 5′ non-redundant set (49 miRNAs) comprised bantam, let-7, miR-1, miR-10, miR-11, miR-100, miR-124, miR-125, miR-12, miR-133, miR-13a, miR-14, miR-184, miR-210, miR-219, miR-263b, miR-275, miR-276b, miR-277, miR-278, miR-279, miR-281, miR-283, miR-285, miR-287, miR-288, miR-303, miR-304, miR-305, miR-307, miR-309, miR-310, miR-314, miR-315, miR-316, miR-317, miR-31a, miR-33, miR-34, miR-3, miR-4, miR-5, miR-79, miR-7, miR-87, miR-8, miR-92a, miR-9a, and miR-iab-4–5p. Additional miRNAs in the 3′ non-redundant set were miR-2b, miR-286, miR-306, miR-308, miR-311, miR-312, miR-313, miR-318, and miR-6. miRNA shuffles and mutants For the completely shuffled miRNAs, we shuffled the miRNA sequence over the entire length and required all possible 8mer and 7mer seeds within the first nine bases to have an equal frequency (±15%) to the D. melanogaster 3′ UTRs (i.e., same single genome count). For the 3′ shuffled miRNAs, we shuffled the 3′ end starting at base 10 and required the shuffles to have equal (±15%) pairing energy to a perfect complement and to 10,000 randomly chosen sites. For each miRNA we created all possible 2-nt mutants (exchanging A to T or C, C to A or G, G to C or T, and T to A or G) within the seed (nucleotides 3–6) and chose the one with the closest alignment frequencies to the real miRNA in D. melanogaster 3′ UTRs and in the conserved sequences in D. melanogaster and D. pseudoobscura 3′ UTRs. Seed matching and site evaluation For each miRNA and seed type we found the 5′ match in the D. melanogaster 3′ UTRs and required it to be 100% conserved in an alignment to the D. pseudoobscura ortholog allowing for positional alignment errors of ±2 nt. When searching 7mer to 4mer seeds we masked all longer seeds to avoid identifying the same site more than once. For each matching site we extracted the 3′ adjacent sequence for both genomes, aligned it to the miRNA 3′ end starting at nucleotide 10 using RNAhybrid [35], and took the worse energy. Supporting Information Accession Numbers The miRNA sequences discussed in this paper can be found in the miRNA Registry (http://www.sanger.ac.uk/Software/Rfam/mirna/index.shtml). NCBI RefSeq (http://www.ncbi.nlm.nih.gov/RefSeq/) accession numbers: bagpipe (NM_169958), Brd (NM_057541), grim (NM_079413), hairy (NM_079253), hid (NM_079412), lin-14 (NM_077516), lin-41 (NM_060087), and Scr (NM_206443). GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) accession numbers: sickle (AF460844) and D. simulans hairy (AY055843).
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Mapping the Human miRNA Interactome by CLASH Reveals Frequent Noncanonical Binding

            Introduction MicroRNAs (miRNAs) play a key role in the posttranscriptional regulation of gene expression by guiding the association between the RNA-induced silencing complex (RISC) and target RNAs (reviewed in Fabian et al., 2010). Human cells express more than 1,000 miRNAs, each potentially binding to hundreds of messenger RNAs (mRNAs) (Lewis et al., 2005), but only a small fraction of these interactions has been validated experimentally. Experiments conducted throughout the last decade have established a set of canonical rules of miRNA-target interactions (reviewed in Bartel, 2009): (1) interactions are mediated by the “seed” region, a 6- to 8-nt-long fragment at the 5′ end of the miRNA that forms Watson-Crick pairs with the target; (2) nucleotides paired outside the seed region stabilize interactions but are reported not to influence miRNA efficacy (Garcia et al., 2011; Grimson et al., 2007); and (3) functional miRNA targets are localized close to the extremes of the 3′ UTRs of protein-coding genes in relatively unstructured regions (Grimson et al., 2007). Recently, RISC-binding sites on mRNAs have been mapped transcriptome wide by crosslinking, immunoprecipitation, and high-throughput sequencing (CLIP-seq), allowing prediction of many miRNA-mRNA interactions (Chi et al., 2009; Hafner et al., 2010a; Zhang and Darnell, 2011) and yielding data consistent with the canonical rules. However, there is substantial evidence for exceptions to these rules. As examples, in C. elegans, the well-studied lin-4::lin-14 interaction involves bulged nucleotides (Ha et al., 1996), whereas the let-7::lin-41 interaction involves wobble G·U pairing (Vella et al., 2004). Human miR-24 targets important cell-cycle genes using interaction sites that are spread over almost the whole miRNA. These interactions lack obvious seed pairing and contain multiple mismatches, bulges, and wobbles (Lal et al., 2009). Analysis of the miR-124 targets recovered by HITS-CLIP revealed a mode of miRNA-mRNA binding that involves a G bulge in the target, opposite miRNA nucleotides 5 and 6. It has been estimated that about 15% of miR-124 targets in mice brain are recognized by this mode of binding (Chi et al., 2012). Another, apparently rare, base-pairing pattern called “centered site” (Shin et al., 2010) involves 11 consecutive Watson-Crick base pairs between the target and positions 4–14 or 5–15 of miRNA. There are also multiple exceptions regarding the requirement for miRNA-binding sites to be located in the 3′ UTR. Functional miRNA-binding sites have occasionally been reported in 5′ UTRs (Grey et al., 2010) and, more frequently, within mRNA coding sequences (Hafner et al., 2010a; Reczko et al., 2012). Moreover, recent reports show that miRNA targets are not limited to protein-coding transcripts and can be found in noncoding RNAs (ncRNAs) that arise from pseudogenes (Poliseno et al., 2010). Together, these data indicate that miRNAs can bind to a wide variety of targets, with both canonical and noncanonical base pairing, and indicate that miRNA targeting rules may be complex and flexible. To allow direct, high-throughput mapping of RNA-RNA interactions, we previously developed crosslinking, ligation, and sequencing of hybrids (CLASH) (Kudla et al., 2011). High-throughput methods have been developed to map protein-DNA interactions, protein-RNA interactions, and DNA-DNA interactions, so CLASH completes the toolkit necessary to study nucleic acid interactomes. Here, we adapted CLASH to allow direct observation of miRNA-target pairs as chimeric reads in deep-sequencing data. Our transcriptome-wide data set reveals the prevalence of seed and nonseed interactions and the diversity of in vivo targets for miRNAs. Results CLASH Directly Maps miRNA-Binding Sites To recover RNA species bound to the human RISC complex, we created an N-terminal fusion of hAGO1 with a protein A-TEV cleavage site-His6 tripartite tag (PTH-AGO1). N-terminally tagged AGO proteins were used previously in many studies and were shown to be functional (Chatterjee and Grosshans, 2009; Lian et al., 2009). Actively growing Flp-In T-REx 293 cells stably expressing PTH-AGO1 were UV irradiated (254 nm) to crosslink proteins to interacting RNAs. PTH-AGO1 was purified, and interacting RNA molecules were partially hydrolyzed, ligated, reverse transcribed, and subjected to Illumina sequencing. At the ligation step, RNA molecules present in AGO-associated miRNA-target duplexes can be joined together (Figure 1A). Following RT-PCR amplification, these generate “chimeric” complementary DNAs (cDNAs), which can be identified because they contain two regions that map to sites that are noncontiguous in the transcriptome sequence (Figure 1B). When AGO1-associated RNAs were analyzed, around 98% were “single reads” representing AGO1-binding sites on RNAs, similar to those obtained with HITS-CLIP and PAR-CLIP (Chi et al., 2009; Hafner et al., 2010a). However, ∼2% were chimeric reads reflecting intermolecular stem structures present in the AGO1-associated RNAs (Figures 1A and 1B). Supporting the significance of the chimeras, 94% of the sequences involved were also recovered as single reads in at least one experiment (Figure 1C). As a control experiment, the lysate obtained from UV-irradiated human cells was mixed with an equal quantity of yeast lysates prior to CLASH analysis (details in Extended Experimental Procedures and Table S2C). This revealed that the background arising from RNA-RNA interactions formed in vitro represents 98%) of the miRNA-target RNAs interactions identified by CLASH had formed in vivo in human cells. Although seed-mediated interactions constitute the largest class in our data, only around 37% of seed interactions involve uninterrupted Watson-Crick base pairing. This figure seemed surprisingly low but is consistent with the many observations of endogenous noncanonical miRNA targets. High-throughput studies found fewer noncanonical (or nonseed) interactions (Chi et al., 2009; Hafner et al., 2010b), but this may reflect an inherent bias in that seed binding was used to computationally identify interactions. Notably, many high-confidence AGO-binding sites identified in previous CLIP-seq data could not be assigned bioinformatically to any specific miRNA. Computational searches for miRNA-mRNA interactions have also been biased toward the identification of binding sites in 3′ UTR regions. In contrast, we observed substantial numbers of miRNA interactions in all the regions of mRNAs, with the greatest number of hits in coding sequences. Notably, different miRNAs vary in the relative proportions of targets in 5′ UTRs, coding sequences, and 3′ UTRs. As examples, miR-100 returned 4% 5′ UTR: 23% CDS: 73% 3′ UTR, whereas miR-149 returned 8% 5′ UTR: 72% CDS: 19% 3′ UTR (data not shown). To provide an overview of the key features of miRNA-mRNA interactions, we analyzed miRNA base-pairing patterns by cluster analysis. As expected, the most frequent miRNA interaction site with a target is the seed, and base pairing in this region is detected for more than half of the interactions. However, seed interactions alone are found in only a relatively small fraction of identified targets (class I, 19%). Defined classes II–III agree with previously described 3′ supplementary and compensatory sites (Grimson et al., 2007; Lian et al., 2009). Unexpectedly, we identified a substantial class of interactions (class IV, 16% of all interactions) that does not involve contacts within the seed region and resembles reported “seedless” interactions (Lal et al., 2009). The identification of miRNAs that predominately interact with target mRNAs using their 3′ regions helps explain the pattern of evolutionary conservation of these miRNAs. However, target mRNAs that fall into this class seem to be relatively poorly conserved in evolution, and high-throughput data show that, on average, these targets respond only weakly to miRNA binding. Our experimental data on the regulation of miR-92a targets agree with this analysis, showing a statistically significant but moderate effect of class IV interactions on mRNA stability and possibly translation in reporter constructs. The results further suggested that the 3′ motif might act cooperatively with seed interactions. It is, of course, possible that the nonseed, motif interactions have additional functions, e.g., in attracting regulatory factors or switching effector pathways. Overall, we show that noncanonical miRNA-mRNA targeting is much more widespread than anticipated. Moreover, the analysis of base-pairing patterns and of miRNA-binding site motifs indicates that individual miRNAs systematically differ in their target binding modes. Indeed, even members of the same miRNA family can manifest distinct base-pairing patterns. This was previously predicted by RepTar (Elefant et al., 2011) and was observed on a small scale in the analysis of enriched 6-mers in mRNAs recovered in AGO-immunoprecipitates following miRNA transfection (Nelson et al., 2011; Wang et al., 2010). The recently published human AGO2 crystal structure (Elkayam et al., 2012) does not exclude the possibility of noncanonical seed interactions. The trajectory of the miRNA seen in the structure leaves most base edges accessible to be read by potential target molecules. Biochemical studies show that the structure of hAGO2 is flexible, and miRNA binding stabilizes and spatially orients AGO2 domains. Differences in patterns of miRNA-target RNA base pairing can induce allosteric changes in the RISC complex, potentially leading to different AGO activities. This suggests that the various interaction classes and/or the specific motifs identified might have distinct functional roles. The integration of CLASH data with RNA-sequencing and proteomics should give a clearer indication of the range of miRNA functions and their relationship to miRNA-mRNA interaction patterns. Many interactions between Argonaute proteins and abundant, stable rRNA and tRNA species can be found in our data and in published high-throughput AGO-CLIP experiments (Chi et al., 2009; Hafner et al., 2010a). Evidence for miRNA-rRNA interactions has been reported, including the association of miR-206 with both nuclear preribosomes and mature cytoplasmic ribosomes (Politz et al., 2006). miR-206 is, however, specific for skeletal muscles and is not expressed in HEK293 cells. In addition, the involvement of AGO2 in pre-rRNA processing has been reported, although it is unclear whether this is dependent on the RISC pathway (Liang and Crooke, 2011). Specific, short tRNA fragments can be bound by AGO proteins and possibly function analogously to miRNAs (Burroughs et al., 2011), but there are no previous reports of tRNAs being targeted by miRNAs. It was recently proposed that “competing endogenous RNA” (ceRNA), generated from transcribed pseudogenes and long noncoding RNAs, participates in mRNA regulation by competing for miRNA binding (Salmena et al., 2011). We speculate that regulation by competition for miRNAs involves not only ncRNAs and other modestly expressed species but also the abundant stable RNAs. In some cases, the highly abundant tRNAs and rRNAs may also “buffer” miRNAs. They might potentially bind miRNAs that are in (perhaps temporary) excess over cognate targets, preventing inappropriate target binding and/or protecting unbound miRNAs against premature degradation. This model is supported by the observation that miRNA interactions with mRNAs have a lower average free energy than those with stable RNA species (data not shown), so authentic target mRNAs might readily recruit cognate miRNAs from the buffered pool. Interactions between pairs of distinct miRNAs were not very frequent (∼3%), but some were highly reproducible and apparently isoform specific—for example, miR-30::let-7. Two published reports of miRNA-miRNA interactions reveal different outcomes. Binding of miR-107 and let-7 mutually reduced miRNA stability and activity (Chen et al., 2011), whereas binding of miR-709 alters the biogenesis of miR-15a/16-1 (Tang et al., 2012). The application of the CLASH technique to miRNAs offers many possibilities for future research. As an example, analyses of miRNA association reveal comparable distributions of miRNAs associated with the four mammalian AGO homologs (Burroughs et al., 2011; Liu et al., 2004; Meister et al., 2004; Su et al., 2009), but it is less clear whether all miRNAs target the same mRNAs when bound to different AGOs. Similarly, closely related paralogs exist for many human miRNAs, but it has been difficult to determine their relative efficiencies in mRNA targeting. The distribution of nontemplated terminal U residues among miRNAs has also been determined (Kim et al., 2010), but not how this effects targeting in vivo. More generally, the spectrum of miRNA-mRNA interactions is expected to rapidly change during differentiation, and viral infection and following metabolic shifts or environmental insults. All of these can potentially be addressed using CLASH. Experimental Procedures CLASH Analyses The previously reported protocol (Kudla et al., 2011) was extensively modified to allow miRNA target identification in mammalian cells. The experimental protocol, variants tested, and bioinformatic analyses are described in detail in the Supplemental Information. Cell Lines A protein A-TEV protease cleavage site 6xHis (PTH) tag was fused to the N terminus of human AGO1 and stably transfected into Flp-In T-REx 293 cells. PTH-AGO1 expression was induced with Doxycycline and confirmed by western blotting. Experimental Validation of CLASH Targets Flp-In T-REx 293-hAGO1 cells were transfected with miR-92a inhibitor or universal negative control. 48 hr posttransfection RNA was isolated, and cDNA was quantified using primers listed in Table S6. Luciferase reporter vectors were prepared by cloning short oligonucleotides containing single miR-92a-binding sites or PCR-amplified long fragments of 3′ UTRs (sequences in Table S6) into the 3′ UTR of Renilla luciferase in the psiCHECK2 vector (Promega). HEK293 cells were transfected in 96-well plates with reporter vectors or nonmodified psiCHECK2 as control together with control or miR-92a inhibitors. Luminescence of Renilla and firefly (internal reference) luciferases was measured 48 hr posttransfection. Extended Experimental Procedures CLASH Protocol Cell Preparation and Crosslinking Flp-In T-REx 293–PTH-AGO1 and control Flp-In T-REx 293 cells (Life Technologies) were seeded onto 150 mm plates (Nunc, Thermo Scientific), 4 plates per sample. The next day cells were induced for hAGO1 production with 0.5 μg/ml Doxycycline (Sigma, D9891). 36 hr post induction growing cells were UV crosslinked on ice with λ = 254 nm in Stratalinker 1800 (Stratagene), power settings = 400 mJ/cm2. Cell Lysis Directly after crosslinking cells were lysed by addition of cooled Lysis Buffer (50 mM Tris pH 7.8, 300 mM NaCl, 1% NP-40, 5 mM EDTA, 10% glycerol, 5 mM β-mercaptoethanol, protease inhibitors (Roche, cOmplete, EDTA-free)). Lysates were centrifuged at 6,400 x g for 30 min at 4°C and supernatant was collected. Unless used directly lysates were stored at −80°C. PTH-AGO1-RNA Purification on IgG-Dynabeads Dynabeads (Life Technologies, M-270 Epoxy) were coated in advance with rabbit IgG (Sigma, I5006) according to a published protocol (Oeffinger et al., 2007). Cell lysates were incubated with IgG-Dynabeads for 40 min at 4°C using 20 mg beads/sample and then beads were washed with LS-IgG WB buffer (50 mM Tris-HCl pH 7.8, 0.3 M NaCl, 5 mM MgCl2, 0.5% NP-40, 2.5% glycerol, 5 mM β-mercaptoethanol), HS-IgG WB buffer (50 mM Tris-HCl pH 7.8, 0.8 M NaCl, 10 mM MgCl2, 0.5% NP-40, 2.5% glycerol, 5 mM β-mercaptoethanol) and PNK-Wash buffer (50 mM Tris-HCl pH 7.5, 10 mM MgCl2, 0.5% NP-40, 50 mM NaCl, 5 mM β-mercaptoethanol). RNase A+T1 Digestion and PTH-AGO1-RNA Elution RNP complexes bound to IgG-Dynabeads were trimmed with 0.5 unit RNaseA+T1 mix (RNace-IT, Stratagene) in 500 μl PNK buffer (50 mM Tris-HCl pH 7.5, 10 mM MgCl2, 0.5% NP-40, 50 mM NaCl, 10 mM β-mercaptoethanol) for 7 min at 20°C. Then RNase solution was removed and PTH-AGO1-RNA complexes were eluted with Ni-WB I buffer (50 mM Tris-HCl pH 7.8, 300 mM NaCl, 10 mM Imidazole pH 8.0, 6 M Guanidine-HCl, 0.1 M NP-40, 5 mM β-mercaptoethanol) for 10 min at 20°C. Purification of PTH-AGO1-RNA on Ni-NTA Agarose The eluate from IgG-Dynabeads was loaded on 40 μl of Ni-NTA Agarose (QIAGEN) equilibrated with Ni-WB I for 2 hr at 4°C and then Ni-NTA beads were washed with Ni-WB I buffer. Ni-NTA beads were transferred to the spin columns (Pierce, Thermo Scientific, 69725) and from that point on all the reactions and washes were performed on the columns. Beads were subsequently washed with Ni-WB II buffer (50 mM Tris-HCl pH 7.8, 300 mM NaCl, 10 mM Imidazole pH 8.0, 0.1 M NP-40, 5 mM β-mercaptoethanol) and extensively washed with PNK-Wash buffer. RNA 5′ End Phosphorylation and Intramolecular Ligation Ni-NTA beads with bound PTH-AGO1-RNA complexes were incubated with 40 units T4 Polynucleotide kinase (New England Biolabs, M0201), 1 mM ATP and RNase inhibitors (RNasin, Promega, N211B) in PNK buffer for 150 min at 20°C, and then washed with Ni-WB I, Ni-WB II and PNK-Wash buffer. PTH-AGO1 bound, interacting RNA molecules were ligated together overnight using 40 units of T4 RNA ligase 1 (New England Biolabs, M0204), 1 mM ATP and RNase inhibitors in PNK buffer at 16°C. On the next day, the ligation mixture was washed out with Ni-WB I, Ni-WB II and PNK-Wash buffer. RNA Dephosphorylation and 3′ miRCat-33 Linker Ligation Ni-NTA beads were resuspended in a dephosphorylation mixture containing 8 units Thermosensitive Alkaline Phosphatase (Promega, M9910) and RNase inhibitors in PNK buffer for 45 min at 20°C and subsequently washed with Ni-WB I, Ni-WB II and PNK-Wash buffer. 3′ miRCat-33 linker ligation was performed for 6 hr at 16°C in the reaction mixture containing 800 units T4 RNA ligase 2 truncated, K227Q (New England Biolabs, M0351), 1 μM miRCat-33 linker (IDT), 10% PEG 8000 and RNase inhibitors in PNK buffer. Beads were washed with Ni-WB I, Ni-WB II and PNK-Wash buffer. Radioactive Labeling of RNA and Elution of PTH-AGO1-RNA Complexes RNAs bound to PTH-AGO1 were radiolabelled with 32P-γ-ATP (Perkin Elmer, 6000 Ci/mmol) in a mixture containing 40 units T4 Polynucleotide kinase, RNase inhibitors in PNK buffer for 30 min at 37°C. Beads were washed with Ni-WB I, Ni-WB II and PNK-Wash buffer. AGO1-RNA complexes were eluted by incubation with Ni-EB for 5 min at room temperature. TCA Precipitation, SDS-PAGE, and Transfer to Nitrocellulose Protein-RNA complexes from the Ni-NTA eluate were precipitated with 2 μg BSA (Sigma) and 17% TCA. Pellets were washed twice with cold acetone, dried, resuspended in 10 μl water and NuPAGE LDS SB (Life Technologies, NP0007). Samples were incubated for 10 min at 65°C and then resolved on a 4%–12% Bis-Tris NuPAGE gel (Life Technologies, NP0335) in NuPAGE SDS MOPS running buffer (Life Technologies, NP0001). Protein-RNA complexes were transferred to nitrocellulose membrane (GE Healthcare, Amersham Hybond ECL) in the wet-transfer tank (Bio-Rad, Mini Trans Blot cell) with NuPage transfer buffer (Life Technologies, NP00061) and 10% methanol for 2 hr at constant voltage = 100V, on ice. Air-dried membrane was exposed on film (GE Healthcare, Amersham Hyperfilm MP) for about 1 hr at room temperature. Developed film was aligned with the membrane and the radioactive bands corresponding to the PTH-hAGO1 complexes were cut out. Proteinase K Treatment and RNA Isolation Cut out bands were incubated with 100 μg of Proteinase K (Roche) and proteinase K buffer (50 mM Tris-HCl pH 7.8, 50 mM NaCl, 10 mM imidazole pH 8.0, 0.1% NP-40, 1% SDS, 5 mM EDTA, 5 mM β-mercaptoethanol) for 2 hr at 55°C. The membrane was discarded. Released RNA was extracted with phenol-chloroform-isoamyl alcohol (PCI) mixture and ethanol precipitated overnight with 20 μg GlycoBlue (Ambion, Life Technologies). Pellets were washed twice with 70% ice-cold EtOH, then air-dried. 5′ Phosphorylation and 5′ Linker Ligation RNAs were phosphorylated with 10u of T4 Polynucleotide kinase in RNA ligase 1 buffer (New England Biolabs) for 30 min at 37°C. Then 10 units of RNA ligase 1 and barcoded 5′ linker (final conc. 5 μM; Table S6) were added and the reaction mixture was incubated for 10 hr at 16°C. RNA was subsequently PCI extracted end ethanol precipitated. Reverse Transcription and Library Amplification by PCR Whole isolated RNA was reverse transcribed with Superscript III Reverse Transcriptase (Life Technologies) according to manufacturer instructions, using miRCat-33 primer (IDT) at 50°C. RNA was then degraded by addition of 2 μl RNase H (New England Biolabs) for 30 min at 37°C. cDNA was amplified using primers P5 and primer PE_miRCat_PCR (Table S6) and TaKaRa LA Taq polymerase (Takara Bio, RR002M). Library Size Selection PCR products were separated on a 3% MetaPhor agarose (Lonza)/TBE gel with SYBRSafe (Life Technologies). The gel was run in 1 x TBE on ice. Then the gel was scanned on a Fujifilm scanner (Fla-5100) and two gel slices with different DNA fragments sizes were cut out: LB: 90 - 100 nt and HB: 110 - 180 nt. After purification with Gel Extraction Kit with MinElute columns (QIAGEN), libraries were stored at −20°C. For high-throughput sequencing, LB:HB fractions were mixed at a 1:3 ratio. Cell Lines Tagged human AGO1 was constructed by ligating the Protein A - TEV protease cleavage site - 6xHis (PTH) tag (amplified from the pRS415 plasmid, a generous gift from Markus T. Bohnsack, Frankfurt, Germany) to the N-terminus of human AGO1 amplified from cDNA, which was then cloned into the pcDNA5/FRT/TO vector (Life Technologies). This vector was further used for stable transfection of Flp-In T-REx 293 cells (Life Technologies) according to the manufacturer’s protocol. Before the CLASH experiments PTH-AGO1 expression was induced with Doxycycline (final concentration = 0.5 μg/ml) for 36 hr, with the aim of achieving expression close to that of endogenous AGO1. Expression of tagged AGO1 was confirmed by the Western Blotting, using peroxidase anti-peroxidase soluble complex (PAP; Sigma, P1291) recognizing the protein A tag. Determination of the Background Level of RNA-RNA Interactions To assess the frequency at which RNA-RNA interactions recovered by CLASH were formed in vitro following cell lysis, lysate obtained from the crosslinked HEK cells expressing PTH-AGO1 was mixed with an equal quantity (measured by RNA content) of yeast cell lysate, prior to CLASH analysis based on the optimized protocol (E4). Analysis of a sample in which no yeast lysate was present (E7) showed the background of human sequences that could be, incorrectly, matched to the yeast genome (0.37% of single reads and 0.11% of miRNA chimeras) (Table S4). Correcting for this, experiment E8, in which yeast lysate was included, reveals that 1.1% of single reads and 1.6% miRNA chimeras arose from yeast RNAs. This low level of background probably reflects the very stringent purification conditions used during CLASH. The same analysis was performed for data sets E9 and E10, prepared using an alternative protocol (E6), in which ligation was performed prior to protein denaturation. This gave a higher level of yeast-human chimeras (∼10%) (Table S4). This indicates that variations in the CLASH protocol can have substantial effects on the signal to noise ratio in the resulting cDNA library, and also confirms the conclusion that protocol E4 is optimal. Experimental Validation of CLASH Targets by qPCR Flp-In T-REx 293-hAGO1 cells were transfected with 25nM miR-92a inhibitor or universal negative control (miRIDIAN Hairpin inhibitors, ThermoScientific, IH-300510-06, IN-001005-01) and Lipofectamine 2000 (Life Technologies). 48 hr post-transfection RNA was isolated using TRI Reagent (Sigma, T9424), DNase treated (TURBO DNase, Ambion) and further purified by ethanol precipitation. cDNA was prepared using SuperScript III (Life Technologies) and random 6-mer oligonucleotides (Promega) at 45°C. cDNA was quantified using Roche LightCycler 480 Real-Time PCR System, Universal Probe Library System (Roche) and primers listed in Table S6. All the primers were designed using Roche UPL Assay Design Center. The expression level of the genes of interest (GIO) was internally normalized to the expression level of GAPDH. Luciferase Reporter Assays Luciferase reporter vectors were prepared by cloning short oligonucleotides containing single miR-92a-binding sites or PCR amplified long fragments of target 3′UTRs into the 3′ UTR of Renilla luciferase in the psiCHECK2 vector (Promega) using XhoI and NotI restriction sites. All the sequences of oligonucleotides used for creating reporters and reporter mutagenesis are listed in the Extended Experimental Procedures. HEK293 cells were transfected in 96-well plates with 50 or 10ng of reporter vectors or non-modified psiCHECK2 as control together with 25nM control or miR-92a inhibitors (miRIDIAN Hairpin inhibitors, Dharmacon, IN-001005-01 and IH-300510-06) or 6.25nM miR-92a controls and inhibitors (IDT, custom oligonucleotides, sequences listed in Extended Experimental Procedures) using Lipofectamine 2000 (Life Technologies). Luminescence of Renilla and firefly (internal reference) luciferases was measured 48 hr post-transfection using the Dual-Glo Luciferase Assay System (Promega, E2920) according to manufacturers’ instructions and SpectraMax M5 Multi-Mode Microplate Reader (Molecular Devices). Microarray Analysis Flp-In T-REx 293-hAGO1 cells were transfected with 6.25nM miR-92a inhibitor or negative control (IDT synthesized custom oligos, sequences in the list of oligonucleotides) and Lipofectamine 2000 (Life Technologies), each sample in 5 replicates. 48 hr post-transfection RNA was isolated using TRIZOL (Invitrogen) and concentrated by ethanol precipitation. Total RNA was processed and quantified on Affymetrix GeneChip Human Exon 1.0 ST by Source BioScience. Bioinformatic Analysis BLAST Transcriptome Database The components of our custom BLAST database come from the following sources: (1) BioMart (http://www.biomart.org), data set: GRCh37.p2: mRNA (cDNA sequence of protein coding genes limited to those with RefSeq protein ID), pseudogenes (miRNA_pseudogene, misc_RNA_pseudogene, Mt_tRNA_pseudogene, polymorphic_pseudogene, RNA_pseudogene, snoRNA_pseudogene, snRNA_pseudogene, tRNA_pseudogene), snoRNA, snRNA, processed transcripts, lincRNA, miscellaneous RNA, mitochondrial rRNA, mitochondrial tRNA, 5.8S rRNA, 5S rRNA; (2) genomic tRNA database http://gtrnadb.ucsc.edu/): human tRNAs; (3) National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov): rRNA sequence (NR_003287.2, NR_003286.2); (4) miRBase release 15 (http://www.mirbase.org): mature human miRNAs. Redundant sequences were removed from the database, as were pseudogenes, lincRNAs and processed transcripts that matched full-length mature miRNA sequences. The mRNA sequences and coordinates in the database match those of ENST entries from ENSEMBL Release 60 (http://useast.ensembl.org/info/website/archives/index.html). For the analysis of experiments E7-E10 we supplemented our database with sequences of S. cerevisiae chromosomes obtained from the Saccharomyces Genome Database. Mapping Reads by BLAST Illumina reads were assigned to the experimental and control samples using their 5′ barcode sequences, and barcodes were clipped. Illumina reads were then mapped by nucleotide BLAST (Altschul et al., 1990) version 2.2.24 to the human transcriptome database described above. BLAST was run with the expectation value set to 0.1, and other parameters set to default. We discarded all antisense matches. Approximately 70% of all mapped reads were mapped unambiguously. The remaining sequences mapped to more than one transcript, typically originating from a single gene. To uniquely assign those reads that were mapped with the same e-value to more than one transcript, we ranked all transcripts according to their total number of BLAST hits and assigned the read to the transcript with most hits. Identification and Clustering of Chimeras We identified as chimeras those reads where: (1) the read yielded at least two blast hits with e-value ≤ 0.1 against the human transcriptome database described above; (2) the hits were either directly adjacent in the read, or with up to 4 nucleotides gap or overlap between hits; and (3) the hits were in sense orientation with respect to the blast database. We rejected reads with more than ten hits in the database. For reads that could be assigned to chimeras in more than one way (for example if one of the fragments was mapped to two different transcripts from the same gene), we used the following criteria to decide which mapping to retain, in the following order: (1) we retained the mapping with the most significant e-value of both fragments; (2) we preferentially retained miRNA-mRNA chimeras; (3) we retained the mapping to transcripts with the highest number of non-chimeric reads. Criterion (2) was introduced to avoid assigning chimeras to pseudogenes, if an mRNA transcript with the same mapping quality was found. After mapping the chimeric reads, we clustered those chimeras for which the coordinates of both mapped fragments overlapped, independently of the order of fragments in the chimera. We first performed the clustering for each of the six experiments independently, and then clustered the six experiments together. Finally we adjusted the transcript regions found in clustered chimeras by extending the miRNA fragment to the full length mature miRNA, and by adding 25 nucleotides downstream sequence to each mRNA fragment. We refer to the mRNA fragments adjusted in this way as the “CLASH targets,” and to the miRNA-mRNA fragment pairs as “CLASH interactions.” Identification of AGO1-Binding Site Clusters from Nonchimeric Reads We identified clusters of non-chimeric reads using a method similar to that previously described (Chi et al., 2009). First, we randomly distributed all distinct reads mapped to each transcript along the same transcript using BEDTools (Quinlan and Hall, 2010), to calculate the maximum random cluster height for each transcript. We then repeated the procedure a hundred times, and retained the maximum cluster height across the repeats. This value was used as a transcript-specific cluster height threshold for the identification of actual clusters. High-confidence clusters were identified as regions of transcripts in which the coverage equaled or exceeded the threshold, the coverage was higher than twenty reads, and the region of high coverage was at least twenty-nucleotide wide. We then calculated high-confidence cluster peaks as the area of twenty nucleotides on each side of the position of highest coverage within the cluster. Overlap between CLASH Targets and Experimentally Validated miRNA Targets from Published Databases To check if our CLASH data set contains known, experimentally validated miRNA targets we compared it with two databases – miRTarBase (Hsu et al., 2011) and TarBase (Vergoulis et al., 2012). Information in these databases is limited: target and miRNA names are often ambiguous (ex. LIN-28B and Lin28 are both used to identify the same gene), and detailed information about the binding site is often missing. To make the comparison possible we downloaded both databases and modified them as follows: 1) we simplified all the miRNA names to “miR-number” (and let-7); 2) we filtered TarBase to only retain human genes with an ENSEMBL gene ID (ENSG) represented in our custom database; 3) we translated gene names in miRTarBase to ENSEMBL gene IDs using the “Gene ID Conversion” tool from Database for Annotation, Visualization and Integrated Discovery (DAVID) (Dennis et al., 2003) and only genes represented in our custom database were retained; 4) we removed redundant interactions within each database. As a consequence each miRNA-mRNA interaction was represented in a simple “miRNA – ENSG” fashion. Overlap between CLASH Targets, AGO-Binding Sites, and Bioinformatically Predicted miRNA Targets The number of CLASH targets that overlapped with AGO1-binding sites obtained in the present study was estimated using BEDTools (Quinlan and Hall, 2010) intersect, using either the positions of all non-chimeric reads mapped to RNA, or the positions of high-confidence cluster peaks calculated as described above. To calculate the overlap between CLASH targets and high-confidence AGO-binding sites obtained in the PAR-CLIP study we first mapped the sequences of the 17,319 clusters from Table S4 in (Hafner et al., 2010a) to our transcriptome database. 15823 clusters were successfully mapped. We then used BEDTools to intersect the positions of CLASH targets and AGO binding clusters. To calculate the degree of overlap between CLASH targets and AGO-binding clusters that would be expected by chance, we randomly placed AGO-binding clusters on: (1) the same transcript in which each cluster was found; (2) a set of random transcripts with a distribution of expression levels matching the transcripts with AGO clusters; or (3) a set of transcripts randomly selected from the human transcriptome. The expression levels of human transcripts in 293 cells were obtained from the microarray data in (Hafner et al., 2010a). We then calculated the number of CLASH targets that overlapped the randomly placed clusters using BEDTools. To calculate the overlap between CLASH targets and bioinformatically predicted miRNA targets by miRanda (John et al., 2004), TargetScan (Lewis et al., 2005), PITA (Kertesz et al., 2007), PicTar (Krek et al., 2005) and RNAhybrid (Rehmsmeier et al., 2004), we extracted the coordinates of predicted miRNA target sites in the hg18 genome from the Functional RNA Project website (www.ncrna.org). We converted CLASH targets from transcriptome to hg19 coordinates using a custom script based on the ENSEMBL Perl API (available on demand), then to hg18 coordinates using liftOver (Kent et al., 2002), and retained the targets mapped to a single genomic position that corresponded to a 3′UTR of a RefSeq gene, only considering those 3′UTR fragments that did not overlap with a CDS of a RefSeq gene on either strand. To generate the control interaction data set, we randomly distributed CLASH targets in the 3′UTR of RefSeq genes (excluding those fragments overlapping with a CDS). We then used BEDTools to overlap CLASH or control interactions with bioinformatically predicted targets. The enrichment was calculated as N_CLASH/N_control, where N_CLASH is the number of CLASH interactions that matched bioinformatic predictions, and N_control is the number of control interactions that matched bioinformatic predictions. To call a match we required that the CLASH interaction and the prediction share both (1) the miRNA name (2) the target position in the mRNA. To decide whether the CLASH targets and predictions refer to the same or different miRNA, we converted miRNA names to a simplified miR-number format, using the regular expression /[a-zA-Z]∗-[0-9]∗/. Analysis of Base Pairing in miRNA-Target Interactions To calculate free energies of binding and determine base pairing between miRNA and target fragments of chimeric reads we used programs from the UNAFold suite (Markham and Zuker, 2008). First, chimeric reads were divided into miRNA and target fragments based on the BLAST analysis. If the miRNA sequence was trimmed then it was replaced with the full-length miRNA sequence from miRBase (Griffiths-Jones et al., 2008). Target fragments were extended by 25 nucleotides at the 3′ end in case the length of the sequencing reads (which ranged from 50 to 100 nts) was too short to include the miRNA-binding site. Then we used the hybrid-min program with default parameters to calculate minimum hybridization energy of the two fragments of chimera that represented interacting RNA molecules. To convert the numeric representation into a Vienna-style dot-bracket format we used a modified version of M. Zuker’s Ct2b.pl script. Evolutionary Conservation of CLASH Targets To analyze evolutionary conservation of CLASH targets, we used per-nucleotide conservation scores among 46 vertebrate genomes calculated using the PhyloP algorithm (Pollard et al., 2010), and downloaded from the UCSC genome browser (Fujita et al., 2011). We selected CLASH targets that were located in 3′UTRs of RefSeq genes, with at least 100 nt of 3′UTR sequence on each side. The CLASH targets were centered at the 5′ end of the longest predicted miRNA-mRNA stem in each target. To calculate the average profile of evolutionary conservation around CLASH targets, the PhyloP score was calculated in the region from −100 to +100 nt from the center, averaged across all targets, and normalized to the number of targets for which the PhyloP score was available. The conservation scores for individual CLASH targets were defined as (C_CLASH - C_control), where C_CLASH is the mean PhyloP score within the longest predicted miRNA-mRNA stem in each target, and C_control is the mean PhyloP score in regions spanning nt −50 through −10 and +10 through +50, relative to the ends of the stem. We used a paired t test to compare the mean conservation within and outside the predicted stems. Analysis of CLASH Target Downregulation after Inhibition of 25 miRNAs To analyze the effect of miRNA inhibition on mRNA transcript levels in specific sets of transcripts, we used the miRNA depletion data from (Hafner et al., 2010a). In this study, 25 miRNAs were depleted using an antisense 2′-O-methyl oligonucleotide cocktail, and mRNA levels were measured by the Human Genome U133 Plus 2.0 Array from Affymetrix. We averaged the log2 GC-RMA signal for all probes matching each transcript, discarding probes matching multiple genes. Probes were assigned to transcripts using data downloaded from ENSEMBL. We then calculated the log2-enrichment of each transcript upon miRNA depletion, by subtracting the average signal in mock-transfected cells from the average signal in 2′-O-methyl oligonucleotide-transfected cells. To calculate the background distribution of mRNA enrichment upon depletion of 25 miRNAs, we randomly selected one transcript for each gene, and then selected a subset of transcripts with expression levels matching those of transcripts in which CLASH targets were identified. We plotted the cumulative distribution of log2-enrichment for these transcripts. As a positive control we used the 331 experimentally validated targets of the 25 miRNAs for which depletion data was available, downloaded from the miRTarBase on August 1, 2011. We then plotted cumulative distributions of log2-enrichment scores after miRNA depletion, for the 1,995 transcripts in which CLASH targets for at least one of the 25 miRNAs were found, or for subsets filtered by: seed complementarity; location of the CLASH target in 5′ UTR, coding sequence, or 3′ UTR; predicted basepairing energy between miRNA and its target; presence of non-chimeric read clusters overlapping with the mRNA fragment of each chimera. Analysis of CLASH Target Downregulation after miR-92a Inhibition Data acquisition and normalization from Affymetrix GeneChip Human Exon 1.0 ST arrays were done by Source BioScience. We averaged the log2 RMA signal for all probes matching each transcript, discarding probes matching multiple genes. We only accepted probes for which expression was detected at p < 0.05 in at least 4 experiments. Probes were assigned to transcripts using data downloaded from ENSEMBL. We then calculated the log2-enrichment of each transcript upon miRNA depletion, by subtracting the average signal in mock-transfected cells from the average signal in miR-92a inhibitor-transfected cells. Wilcoxon tests were used to check whether the log2-enrichment differs between sets of genes. Identification of Base-Pairing Classes by K-Means Clustering We extracted the miRNA basepairing pattern for each CLASH interaction using a modified version of the Ct2B.pl script written by M. Zuker. We then converted the dot and bracket strings into a numeric format by replacing dots with zeroes, and brackets with a number F that represented the minimum energy of each interaction, rescaled from 0 to 5. This was calculated as F = 5∗(dG-dGlow)/(dGhigh-dGlow), where dG is the minimum energy of the interaction as calculated by hybrid-min, dGlow was set to −11 kcal/mol, and dGhigh to −16 kcal/mol. F was capped at 0 at the bottom and 5 at the top. We next performed K-means clustering on the matrix of 18,514 interactions and 22 miRNA positions using Gene Cluster 3.0 (downloaded from http://bonsai.hgc.jp/∼mdehoon/software/cluster/software.htm), to group interactions with similar basepairing patterns. We performed 50 runs of K-means clustering with K set to 4, 5, or 6, using Euclidean distance, and we selected the grouping into 5 clusters for presentation, as it yielded a clear separation into well-defined classes of comparable size. We used two independent approaches to analyze the clustering patterns expected by chance. First, we randomly reassigned (shuffled) miRNA-mRNA pairs found in chimeras, before performing the folding prediction and clustering. This overestimated the number of interactions expected among random miRNA and mRNA fragments because many targets of abundant miRNAs were assigned by chance to the same miRNA. As a second approach, we randomly permuted (scrambled) the target sequence found in each interaction, while conserving the nucleotide content of the targets. Again, this approach is conservative because keeping the nucleotide content will act to increase apparent interaction strengths. Nevertheless, the basepairing patterns observed in the two randomized sets were very different from the one seen for CLASH interactions. The basepairing patterns were represented graphically as heatmaps using Java TreeView (Saldanha, 2004), using contrast = 5.0 and a linear color scale. To analyze basepairing patterns for subsets of CLASH interactions, the clustering was performed first for the entire data set and the relevant interactions were extracted next, to preserve the order of clusters. Clustering each subset independently yielded similar results. Identification of Enriched Sequence Motifs in CLASH Targets To find overrepresented motifs in CLASH targets, we first extracted all miRNAs with at least 10 targets found. For each miRNA separately we ran MEME (Bailey and Elkan, 1994), with the settings: -dna -mod zoops -maxw 7 -nmotifs 1, using the default 0-order Markov model based on nucleotide frequencies in the training set as the background model. Changing the maximum motif length to 8 or 9; the maximum number of motifs to 2, or the number of nucleotides by which CLASH targets were bioinformatically extended between 0 and 50 did not change the conclusions. The motifs identified by meme for targets of each miRNA were then aligned to the reverse-complemented miRNA sequence using FIMO (Grant et al., 2011), with the setting–output-pthresh 0.01. We then selected the motifs with FIMO q-value (FDR) < 0.05, and meme Bonferroni-corrected p-value < 0.05, yielding a set of 108 high-confidence motifs. For motifs that could be mapped to the reverse-complemented miRNA sequence in more than one way, the mapping with the most significant q-value was selected. (Details of the MEME analysis available upon request).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Expanding the microRNA targeting code: functional sites with centered pairing.

              Most metazoan microRNA (miRNA) target sites have perfect pairing to the seed region, located near the miRNA 5' end. Although pairing to the 3' region sometimes supplements seed matches or compensates for mismatches, pairing to the central region has been known to function only at rare sites that impart Argonaute-catalyzed mRNA cleavage. Here, we present "centered sites," a class of miRNA target sites that lack both perfect seed pairing and 3'-compensatory pairing and instead have 11-12 contiguous Watson-Crick pairs to the center of the miRNA. Although centered sites can impart mRNA cleavage in vitro (in elevated Mg(2+)), in cells they repress protein output without consequential Argonaute-catalyzed cleavage. Our study also identified extensively paired sites that are cleavage substrates in cultured cells and human brain. This expanded repertoire of cleavage targets and the identification of the centered site type help explain why central regions of many miRNAs are evolutionarily conserved. Copyright (c) 2010 Elsevier Inc. All rights reserved.
                Bookmark

                Author and article information

                Journal
                Science
                Science
                American Association for the Advancement of Science (AAAS)
                0036-8075
                1095-9203
                December 05 2019
                : eaav1741
                Article
                10.1126/science.aav1741
                7051167
                31806698
                6635f513-45f4-4419-ae5b-0ff46bd7f883
                © 2019
                History

                Comments

                Comment on this article