Introduction MicroRNAs (miRNAs) play a key role in the posttranscriptional regulation of gene expression by guiding the association between the RNA-induced silencing complex (RISC) and target RNAs (reviewed in Fabian et al., 2010). Human cells express more than 1,000 miRNAs, each potentially binding to hundreds of messenger RNAs (mRNAs) (Lewis et al., 2005), but only a small fraction of these interactions has been validated experimentally. Experiments conducted throughout the last decade have established a set of canonical rules of miRNA-target interactions (reviewed in Bartel, 2009): (1) interactions are mediated by the “seed” region, a 6- to 8-nt-long fragment at the 5′ end of the miRNA that forms Watson-Crick pairs with the target; (2) nucleotides paired outside the seed region stabilize interactions but are reported not to influence miRNA efficacy (Garcia et al., 2011; Grimson et al., 2007); and (3) functional miRNA targets are localized close to the extremes of the 3′ UTRs of protein-coding genes in relatively unstructured regions (Grimson et al., 2007). Recently, RISC-binding sites on mRNAs have been mapped transcriptome wide by crosslinking, immunoprecipitation, and high-throughput sequencing (CLIP-seq), allowing prediction of many miRNA-mRNA interactions (Chi et al., 2009; Hafner et al., 2010a; Zhang and Darnell, 2011) and yielding data consistent with the canonical rules. However, there is substantial evidence for exceptions to these rules. As examples, in C. elegans, the well-studied lin-4::lin-14 interaction involves bulged nucleotides (Ha et al., 1996), whereas the let-7::lin-41 interaction involves wobble G·U pairing (Vella et al., 2004). Human miR-24 targets important cell-cycle genes using interaction sites that are spread over almost the whole miRNA. These interactions lack obvious seed pairing and contain multiple mismatches, bulges, and wobbles (Lal et al., 2009). Analysis of the miR-124 targets recovered by HITS-CLIP revealed a mode of miRNA-mRNA binding that involves a G bulge in the target, opposite miRNA nucleotides 5 and 6. It has been estimated that about 15% of miR-124 targets in mice brain are recognized by this mode of binding (Chi et al., 2012). Another, apparently rare, base-pairing pattern called “centered site” (Shin et al., 2010) involves 11 consecutive Watson-Crick base pairs between the target and positions 4–14 or 5–15 of miRNA. There are also multiple exceptions regarding the requirement for miRNA-binding sites to be located in the 3′ UTR. Functional miRNA-binding sites have occasionally been reported in 5′ UTRs (Grey et al., 2010) and, more frequently, within mRNA coding sequences (Hafner et al., 2010a; Reczko et al., 2012). Moreover, recent reports show that miRNA targets are not limited to protein-coding transcripts and can be found in noncoding RNAs (ncRNAs) that arise from pseudogenes (Poliseno et al., 2010). Together, these data indicate that miRNAs can bind to a wide variety of targets, with both canonical and noncanonical base pairing, and indicate that miRNA targeting rules may be complex and flexible. To allow direct, high-throughput mapping of RNA-RNA interactions, we previously developed crosslinking, ligation, and sequencing of hybrids (CLASH) (Kudla et al., 2011). High-throughput methods have been developed to map protein-DNA interactions, protein-RNA interactions, and DNA-DNA interactions, so CLASH completes the toolkit necessary to study nucleic acid interactomes. Here, we adapted CLASH to allow direct observation of miRNA-target pairs as chimeric reads in deep-sequencing data. Our transcriptome-wide data set reveals the prevalence of seed and nonseed interactions and the diversity of in vivo targets for miRNAs. Results CLASH Directly Maps miRNA-Binding Sites To recover RNA species bound to the human RISC complex, we created an N-terminal fusion of hAGO1 with a protein A-TEV cleavage site-His6 tripartite tag (PTH-AGO1). N-terminally tagged AGO proteins were used previously in many studies and were shown to be functional (Chatterjee and Grosshans, 2009; Lian et al., 2009). Actively growing Flp-In T-REx 293 cells stably expressing PTH-AGO1 were UV irradiated (254 nm) to crosslink proteins to interacting RNAs. PTH-AGO1 was purified, and interacting RNA molecules were partially hydrolyzed, ligated, reverse transcribed, and subjected to Illumina sequencing. At the ligation step, RNA molecules present in AGO-associated miRNA-target duplexes can be joined together (Figure 1A). Following RT-PCR amplification, these generate “chimeric” complementary DNAs (cDNAs), which can be identified because they contain two regions that map to sites that are noncontiguous in the transcriptome sequence (Figure 1B). When AGO1-associated RNAs were analyzed, around 98% were “single reads” representing AGO1-binding sites on RNAs, similar to those obtained with HITS-CLIP and PAR-CLIP (Chi et al., 2009; Hafner et al., 2010a). However, ∼2% were chimeric reads reflecting intermolecular stem structures present in the AGO1-associated RNAs (Figures 1A and 1B). Supporting the significance of the chimeras, 94% of the sequences involved were also recovered as single reads in at least one experiment (Figure 1C). As a control experiment, the lysate obtained from UV-irradiated human cells was mixed with an equal quantity of yeast lysates prior to CLASH analysis (details in Extended Experimental Procedures and Table S2C). This revealed that the background arising from RNA-RNA interactions formed in vitro represents 98%) of the miRNA-target RNAs interactions identified by CLASH had formed in vivo in human cells. Although seed-mediated interactions constitute the largest class in our data, only around 37% of seed interactions involve uninterrupted Watson-Crick base pairing. This figure seemed surprisingly low but is consistent with the many observations of endogenous noncanonical miRNA targets. High-throughput studies found fewer noncanonical (or nonseed) interactions (Chi et al., 2009; Hafner et al., 2010b), but this may reflect an inherent bias in that seed binding was used to computationally identify interactions. Notably, many high-confidence AGO-binding sites identified in previous CLIP-seq data could not be assigned bioinformatically to any specific miRNA. Computational searches for miRNA-mRNA interactions have also been biased toward the identification of binding sites in 3′ UTR regions. In contrast, we observed substantial numbers of miRNA interactions in all the regions of mRNAs, with the greatest number of hits in coding sequences. Notably, different miRNAs vary in the relative proportions of targets in 5′ UTRs, coding sequences, and 3′ UTRs. As examples, miR-100 returned 4% 5′ UTR: 23% CDS: 73% 3′ UTR, whereas miR-149 returned 8% 5′ UTR: 72% CDS: 19% 3′ UTR (data not shown). To provide an overview of the key features of miRNA-mRNA interactions, we analyzed miRNA base-pairing patterns by cluster analysis. As expected, the most frequent miRNA interaction site with a target is the seed, and base pairing in this region is detected for more than half of the interactions. However, seed interactions alone are found in only a relatively small fraction of identified targets (class I, 19%). Defined classes II–III agree with previously described 3′ supplementary and compensatory sites (Grimson et al., 2007; Lian et al., 2009). Unexpectedly, we identified a substantial class of interactions (class IV, 16% of all interactions) that does not involve contacts within the seed region and resembles reported “seedless” interactions (Lal et al., 2009). The identification of miRNAs that predominately interact with target mRNAs using their 3′ regions helps explain the pattern of evolutionary conservation of these miRNAs. However, target mRNAs that fall into this class seem to be relatively poorly conserved in evolution, and high-throughput data show that, on average, these targets respond only weakly to miRNA binding. Our experimental data on the regulation of miR-92a targets agree with this analysis, showing a statistically significant but moderate effect of class IV interactions on mRNA stability and possibly translation in reporter constructs. The results further suggested that the 3′ motif might act cooperatively with seed interactions. It is, of course, possible that the nonseed, motif interactions have additional functions, e.g., in attracting regulatory factors or switching effector pathways. Overall, we show that noncanonical miRNA-mRNA targeting is much more widespread than anticipated. Moreover, the analysis of base-pairing patterns and of miRNA-binding site motifs indicates that individual miRNAs systematically differ in their target binding modes. Indeed, even members of the same miRNA family can manifest distinct base-pairing patterns. This was previously predicted by RepTar (Elefant et al., 2011) and was observed on a small scale in the analysis of enriched 6-mers in mRNAs recovered in AGO-immunoprecipitates following miRNA transfection (Nelson et al., 2011; Wang et al., 2010). The recently published human AGO2 crystal structure (Elkayam et al., 2012) does not exclude the possibility of noncanonical seed interactions. The trajectory of the miRNA seen in the structure leaves most base edges accessible to be read by potential target molecules. Biochemical studies show that the structure of hAGO2 is flexible, and miRNA binding stabilizes and spatially orients AGO2 domains. Differences in patterns of miRNA-target RNA base pairing can induce allosteric changes in the RISC complex, potentially leading to different AGO activities. This suggests that the various interaction classes and/or the specific motifs identified might have distinct functional roles. The integration of CLASH data with RNA-sequencing and proteomics should give a clearer indication of the range of miRNA functions and their relationship to miRNA-mRNA interaction patterns. Many interactions between Argonaute proteins and abundant, stable rRNA and tRNA species can be found in our data and in published high-throughput AGO-CLIP experiments (Chi et al., 2009; Hafner et al., 2010a). Evidence for miRNA-rRNA interactions has been reported, including the association of miR-206 with both nuclear preribosomes and mature cytoplasmic ribosomes (Politz et al., 2006). miR-206 is, however, specific for skeletal muscles and is not expressed in HEK293 cells. In addition, the involvement of AGO2 in pre-rRNA processing has been reported, although it is unclear whether this is dependent on the RISC pathway (Liang and Crooke, 2011). Specific, short tRNA fragments can be bound by AGO proteins and possibly function analogously to miRNAs (Burroughs et al., 2011), but there are no previous reports of tRNAs being targeted by miRNAs. It was recently proposed that “competing endogenous RNA” (ceRNA), generated from transcribed pseudogenes and long noncoding RNAs, participates in mRNA regulation by competing for miRNA binding (Salmena et al., 2011). We speculate that regulation by competition for miRNAs involves not only ncRNAs and other modestly expressed species but also the abundant stable RNAs. In some cases, the highly abundant tRNAs and rRNAs may also “buffer” miRNAs. They might potentially bind miRNAs that are in (perhaps temporary) excess over cognate targets, preventing inappropriate target binding and/or protecting unbound miRNAs against premature degradation. This model is supported by the observation that miRNA interactions with mRNAs have a lower average free energy than those with stable RNA species (data not shown), so authentic target mRNAs might readily recruit cognate miRNAs from the buffered pool. Interactions between pairs of distinct miRNAs were not very frequent (∼3%), but some were highly reproducible and apparently isoform specific—for example, miR-30::let-7. Two published reports of miRNA-miRNA interactions reveal different outcomes. Binding of miR-107 and let-7 mutually reduced miRNA stability and activity (Chen et al., 2011), whereas binding of miR-709 alters the biogenesis of miR-15a/16-1 (Tang et al., 2012). The application of the CLASH technique to miRNAs offers many possibilities for future research. As an example, analyses of miRNA association reveal comparable distributions of miRNAs associated with the four mammalian AGO homologs (Burroughs et al., 2011; Liu et al., 2004; Meister et al., 2004; Su et al., 2009), but it is less clear whether all miRNAs target the same mRNAs when bound to different AGOs. Similarly, closely related paralogs exist for many human miRNAs, but it has been difficult to determine their relative efficiencies in mRNA targeting. The distribution of nontemplated terminal U residues among miRNAs has also been determined (Kim et al., 2010), but not how this effects targeting in vivo. More generally, the spectrum of miRNA-mRNA interactions is expected to rapidly change during differentiation, and viral infection and following metabolic shifts or environmental insults. All of these can potentially be addressed using CLASH. Experimental Procedures CLASH Analyses The previously reported protocol (Kudla et al., 2011) was extensively modified to allow miRNA target identification in mammalian cells. The experimental protocol, variants tested, and bioinformatic analyses are described in detail in the Supplemental Information. Cell Lines A protein A-TEV protease cleavage site 6xHis (PTH) tag was fused to the N terminus of human AGO1 and stably transfected into Flp-In T-REx 293 cells. PTH-AGO1 expression was induced with Doxycycline and confirmed by western blotting. Experimental Validation of CLASH Targets Flp-In T-REx 293-hAGO1 cells were transfected with miR-92a inhibitor or universal negative control. 48 hr posttransfection RNA was isolated, and cDNA was quantified using primers listed in Table S6. Luciferase reporter vectors were prepared by cloning short oligonucleotides containing single miR-92a-binding sites or PCR-amplified long fragments of 3′ UTRs (sequences in Table S6) into the 3′ UTR of Renilla luciferase in the psiCHECK2 vector (Promega). HEK293 cells were transfected in 96-well plates with reporter vectors or nonmodified psiCHECK2 as control together with control or miR-92a inhibitors. Luminescence of Renilla and firefly (internal reference) luciferases was measured 48 hr posttransfection. Extended Experimental Procedures CLASH Protocol Cell Preparation and Crosslinking Flp-In T-REx 293–PTH-AGO1 and control Flp-In T-REx 293 cells (Life Technologies) were seeded onto 150 mm plates (Nunc, Thermo Scientific), 4 plates per sample. The next day cells were induced for hAGO1 production with 0.5 μg/ml Doxycycline (Sigma, D9891). 36 hr post induction growing cells were UV crosslinked on ice with λ = 254 nm in Stratalinker 1800 (Stratagene), power settings = 400 mJ/cm2. Cell Lysis Directly after crosslinking cells were lysed by addition of cooled Lysis Buffer (50 mM Tris pH 7.8, 300 mM NaCl, 1% NP-40, 5 mM EDTA, 10% glycerol, 5 mM β-mercaptoethanol, protease inhibitors (Roche, cOmplete, EDTA-free)). Lysates were centrifuged at 6,400 x g for 30 min at 4°C and supernatant was collected. Unless used directly lysates were stored at −80°C. PTH-AGO1-RNA Purification on IgG-Dynabeads Dynabeads (Life Technologies, M-270 Epoxy) were coated in advance with rabbit IgG (Sigma, I5006) according to a published protocol (Oeffinger et al., 2007). Cell lysates were incubated with IgG-Dynabeads for 40 min at 4°C using 20 mg beads/sample and then beads were washed with LS-IgG WB buffer (50 mM Tris-HCl pH 7.8, 0.3 M NaCl, 5 mM MgCl2, 0.5% NP-40, 2.5% glycerol, 5 mM β-mercaptoethanol), HS-IgG WB buffer (50 mM Tris-HCl pH 7.8, 0.8 M NaCl, 10 mM MgCl2, 0.5% NP-40, 2.5% glycerol, 5 mM β-mercaptoethanol) and PNK-Wash buffer (50 mM Tris-HCl pH 7.5, 10 mM MgCl2, 0.5% NP-40, 50 mM NaCl, 5 mM β-mercaptoethanol). RNase A+T1 Digestion and PTH-AGO1-RNA Elution RNP complexes bound to IgG-Dynabeads were trimmed with 0.5 unit RNaseA+T1 mix (RNace-IT, Stratagene) in 500 μl PNK buffer (50 mM Tris-HCl pH 7.5, 10 mM MgCl2, 0.5% NP-40, 50 mM NaCl, 10 mM β-mercaptoethanol) for 7 min at 20°C. Then RNase solution was removed and PTH-AGO1-RNA complexes were eluted with Ni-WB I buffer (50 mM Tris-HCl pH 7.8, 300 mM NaCl, 10 mM Imidazole pH 8.0, 6 M Guanidine-HCl, 0.1 M NP-40, 5 mM β-mercaptoethanol) for 10 min at 20°C. Purification of PTH-AGO1-RNA on Ni-NTA Agarose The eluate from IgG-Dynabeads was loaded on 40 μl of Ni-NTA Agarose (QIAGEN) equilibrated with Ni-WB I for 2 hr at 4°C and then Ni-NTA beads were washed with Ni-WB I buffer. Ni-NTA beads were transferred to the spin columns (Pierce, Thermo Scientific, 69725) and from that point on all the reactions and washes were performed on the columns. Beads were subsequently washed with Ni-WB II buffer (50 mM Tris-HCl pH 7.8, 300 mM NaCl, 10 mM Imidazole pH 8.0, 0.1 M NP-40, 5 mM β-mercaptoethanol) and extensively washed with PNK-Wash buffer. RNA 5′ End Phosphorylation and Intramolecular Ligation Ni-NTA beads with bound PTH-AGO1-RNA complexes were incubated with 40 units T4 Polynucleotide kinase (New England Biolabs, M0201), 1 mM ATP and RNase inhibitors (RNasin, Promega, N211B) in PNK buffer for 150 min at 20°C, and then washed with Ni-WB I, Ni-WB II and PNK-Wash buffer. PTH-AGO1 bound, interacting RNA molecules were ligated together overnight using 40 units of T4 RNA ligase 1 (New England Biolabs, M0204), 1 mM ATP and RNase inhibitors in PNK buffer at 16°C. On the next day, the ligation mixture was washed out with Ni-WB I, Ni-WB II and PNK-Wash buffer. RNA Dephosphorylation and 3′ miRCat-33 Linker Ligation Ni-NTA beads were resuspended in a dephosphorylation mixture containing 8 units Thermosensitive Alkaline Phosphatase (Promega, M9910) and RNase inhibitors in PNK buffer for 45 min at 20°C and subsequently washed with Ni-WB I, Ni-WB II and PNK-Wash buffer. 3′ miRCat-33 linker ligation was performed for 6 hr at 16°C in the reaction mixture containing 800 units T4 RNA ligase 2 truncated, K227Q (New England Biolabs, M0351), 1 μM miRCat-33 linker (IDT), 10% PEG 8000 and RNase inhibitors in PNK buffer. Beads were washed with Ni-WB I, Ni-WB II and PNK-Wash buffer. Radioactive Labeling of RNA and Elution of PTH-AGO1-RNA Complexes RNAs bound to PTH-AGO1 were radiolabelled with 32P-γ-ATP (Perkin Elmer, 6000 Ci/mmol) in a mixture containing 40 units T4 Polynucleotide kinase, RNase inhibitors in PNK buffer for 30 min at 37°C. Beads were washed with Ni-WB I, Ni-WB II and PNK-Wash buffer. AGO1-RNA complexes were eluted by incubation with Ni-EB for 5 min at room temperature. TCA Precipitation, SDS-PAGE, and Transfer to Nitrocellulose Protein-RNA complexes from the Ni-NTA eluate were precipitated with 2 μg BSA (Sigma) and 17% TCA. Pellets were washed twice with cold acetone, dried, resuspended in 10 μl water and NuPAGE LDS SB (Life Technologies, NP0007). Samples were incubated for 10 min at 65°C and then resolved on a 4%–12% Bis-Tris NuPAGE gel (Life Technologies, NP0335) in NuPAGE SDS MOPS running buffer (Life Technologies, NP0001). Protein-RNA complexes were transferred to nitrocellulose membrane (GE Healthcare, Amersham Hybond ECL) in the wet-transfer tank (Bio-Rad, Mini Trans Blot cell) with NuPage transfer buffer (Life Technologies, NP00061) and 10% methanol for 2 hr at constant voltage = 100V, on ice. Air-dried membrane was exposed on film (GE Healthcare, Amersham Hyperfilm MP) for about 1 hr at room temperature. Developed film was aligned with the membrane and the radioactive bands corresponding to the PTH-hAGO1 complexes were cut out. Proteinase K Treatment and RNA Isolation Cut out bands were incubated with 100 μg of Proteinase K (Roche) and proteinase K buffer (50 mM Tris-HCl pH 7.8, 50 mM NaCl, 10 mM imidazole pH 8.0, 0.1% NP-40, 1% SDS, 5 mM EDTA, 5 mM β-mercaptoethanol) for 2 hr at 55°C. The membrane was discarded. Released RNA was extracted with phenol-chloroform-isoamyl alcohol (PCI) mixture and ethanol precipitated overnight with 20 μg GlycoBlue (Ambion, Life Technologies). Pellets were washed twice with 70% ice-cold EtOH, then air-dried. 5′ Phosphorylation and 5′ Linker Ligation RNAs were phosphorylated with 10u of T4 Polynucleotide kinase in RNA ligase 1 buffer (New England Biolabs) for 30 min at 37°C. Then 10 units of RNA ligase 1 and barcoded 5′ linker (final conc. 5 μM; Table S6) were added and the reaction mixture was incubated for 10 hr at 16°C. RNA was subsequently PCI extracted end ethanol precipitated. Reverse Transcription and Library Amplification by PCR Whole isolated RNA was reverse transcribed with Superscript III Reverse Transcriptase (Life Technologies) according to manufacturer instructions, using miRCat-33 primer (IDT) at 50°C. RNA was then degraded by addition of 2 μl RNase H (New England Biolabs) for 30 min at 37°C. cDNA was amplified using primers P5 and primer PE_miRCat_PCR (Table S6) and TaKaRa LA Taq polymerase (Takara Bio, RR002M). Library Size Selection PCR products were separated on a 3% MetaPhor agarose (Lonza)/TBE gel with SYBRSafe (Life Technologies). The gel was run in 1 x TBE on ice. Then the gel was scanned on a Fujifilm scanner (Fla-5100) and two gel slices with different DNA fragments sizes were cut out: LB: 90 - 100 nt and HB: 110 - 180 nt. After purification with Gel Extraction Kit with MinElute columns (QIAGEN), libraries were stored at −20°C. For high-throughput sequencing, LB:HB fractions were mixed at a 1:3 ratio. Cell Lines Tagged human AGO1 was constructed by ligating the Protein A - TEV protease cleavage site - 6xHis (PTH) tag (amplified from the pRS415 plasmid, a generous gift from Markus T. Bohnsack, Frankfurt, Germany) to the N-terminus of human AGO1 amplified from cDNA, which was then cloned into the pcDNA5/FRT/TO vector (Life Technologies). This vector was further used for stable transfection of Flp-In T-REx 293 cells (Life Technologies) according to the manufacturer’s protocol. Before the CLASH experiments PTH-AGO1 expression was induced with Doxycycline (final concentration = 0.5 μg/ml) for 36 hr, with the aim of achieving expression close to that of endogenous AGO1. Expression of tagged AGO1 was confirmed by the Western Blotting, using peroxidase anti-peroxidase soluble complex (PAP; Sigma, P1291) recognizing the protein A tag. Determination of the Background Level of RNA-RNA Interactions To assess the frequency at which RNA-RNA interactions recovered by CLASH were formed in vitro following cell lysis, lysate obtained from the crosslinked HEK cells expressing PTH-AGO1 was mixed with an equal quantity (measured by RNA content) of yeast cell lysate, prior to CLASH analysis based on the optimized protocol (E4). Analysis of a sample in which no yeast lysate was present (E7) showed the background of human sequences that could be, incorrectly, matched to the yeast genome (0.37% of single reads and 0.11% of miRNA chimeras) (Table S4). Correcting for this, experiment E8, in which yeast lysate was included, reveals that 1.1% of single reads and 1.6% miRNA chimeras arose from yeast RNAs. This low level of background probably reflects the very stringent purification conditions used during CLASH. The same analysis was performed for data sets E9 and E10, prepared using an alternative protocol (E6), in which ligation was performed prior to protein denaturation. This gave a higher level of yeast-human chimeras (∼10%) (Table S4). This indicates that variations in the CLASH protocol can have substantial effects on the signal to noise ratio in the resulting cDNA library, and also confirms the conclusion that protocol E4 is optimal. Experimental Validation of CLASH Targets by qPCR Flp-In T-REx 293-hAGO1 cells were transfected with 25nM miR-92a inhibitor or universal negative control (miRIDIAN Hairpin inhibitors, ThermoScientific, IH-300510-06, IN-001005-01) and Lipofectamine 2000 (Life Technologies). 48 hr post-transfection RNA was isolated using TRI Reagent (Sigma, T9424), DNase treated (TURBO DNase, Ambion) and further purified by ethanol precipitation. cDNA was prepared using SuperScript III (Life Technologies) and random 6-mer oligonucleotides (Promega) at 45°C. cDNA was quantified using Roche LightCycler 480 Real-Time PCR System, Universal Probe Library System (Roche) and primers listed in Table S6. All the primers were designed using Roche UPL Assay Design Center. The expression level of the genes of interest (GIO) was internally normalized to the expression level of GAPDH. Luciferase Reporter Assays Luciferase reporter vectors were prepared by cloning short oligonucleotides containing single miR-92a-binding sites or PCR amplified long fragments of target 3′UTRs into the 3′ UTR of Renilla luciferase in the psiCHECK2 vector (Promega) using XhoI and NotI restriction sites. All the sequences of oligonucleotides used for creating reporters and reporter mutagenesis are listed in the Extended Experimental Procedures. HEK293 cells were transfected in 96-well plates with 50 or 10ng of reporter vectors or non-modified psiCHECK2 as control together with 25nM control or miR-92a inhibitors (miRIDIAN Hairpin inhibitors, Dharmacon, IN-001005-01 and IH-300510-06) or 6.25nM miR-92a controls and inhibitors (IDT, custom oligonucleotides, sequences listed in Extended Experimental Procedures) using Lipofectamine 2000 (Life Technologies). Luminescence of Renilla and firefly (internal reference) luciferases was measured 48 hr post-transfection using the Dual-Glo Luciferase Assay System (Promega, E2920) according to manufacturers’ instructions and SpectraMax M5 Multi-Mode Microplate Reader (Molecular Devices). Microarray Analysis Flp-In T-REx 293-hAGO1 cells were transfected with 6.25nM miR-92a inhibitor or negative control (IDT synthesized custom oligos, sequences in the list of oligonucleotides) and Lipofectamine 2000 (Life Technologies), each sample in 5 replicates. 48 hr post-transfection RNA was isolated using TRIZOL (Invitrogen) and concentrated by ethanol precipitation. Total RNA was processed and quantified on Affymetrix GeneChip Human Exon 1.0 ST by Source BioScience. Bioinformatic Analysis BLAST Transcriptome Database The components of our custom BLAST database come from the following sources: (1) BioMart (http://www.biomart.org), data set: GRCh37.p2: mRNA (cDNA sequence of protein coding genes limited to those with RefSeq protein ID), pseudogenes (miRNA_pseudogene, misc_RNA_pseudogene, Mt_tRNA_pseudogene, polymorphic_pseudogene, RNA_pseudogene, snoRNA_pseudogene, snRNA_pseudogene, tRNA_pseudogene), snoRNA, snRNA, processed transcripts, lincRNA, miscellaneous RNA, mitochondrial rRNA, mitochondrial tRNA, 5.8S rRNA, 5S rRNA; (2) genomic tRNA database http://gtrnadb.ucsc.edu/): human tRNAs; (3) National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov): rRNA sequence (NR_003287.2, NR_003286.2); (4) miRBase release 15 (http://www.mirbase.org): mature human miRNAs. Redundant sequences were removed from the database, as were pseudogenes, lincRNAs and processed transcripts that matched full-length mature miRNA sequences. The mRNA sequences and coordinates in the database match those of ENST entries from ENSEMBL Release 60 (http://useast.ensembl.org/info/website/archives/index.html). For the analysis of experiments E7-E10 we supplemented our database with sequences of S. cerevisiae chromosomes obtained from the Saccharomyces Genome Database. Mapping Reads by BLAST Illumina reads were assigned to the experimental and control samples using their 5′ barcode sequences, and barcodes were clipped. Illumina reads were then mapped by nucleotide BLAST (Altschul et al., 1990) version 2.2.24 to the human transcriptome database described above. BLAST was run with the expectation value set to 0.1, and other parameters set to default. We discarded all antisense matches. Approximately 70% of all mapped reads were mapped unambiguously. The remaining sequences mapped to more than one transcript, typically originating from a single gene. To uniquely assign those reads that were mapped with the same e-value to more than one transcript, we ranked all transcripts according to their total number of BLAST hits and assigned the read to the transcript with most hits. Identification and Clustering of Chimeras We identified as chimeras those reads where: (1) the read yielded at least two blast hits with e-value ≤ 0.1 against the human transcriptome database described above; (2) the hits were either directly adjacent in the read, or with up to 4 nucleotides gap or overlap between hits; and (3) the hits were in sense orientation with respect to the blast database. We rejected reads with more than ten hits in the database. For reads that could be assigned to chimeras in more than one way (for example if one of the fragments was mapped to two different transcripts from the same gene), we used the following criteria to decide which mapping to retain, in the following order: (1) we retained the mapping with the most significant e-value of both fragments; (2) we preferentially retained miRNA-mRNA chimeras; (3) we retained the mapping to transcripts with the highest number of non-chimeric reads. Criterion (2) was introduced to avoid assigning chimeras to pseudogenes, if an mRNA transcript with the same mapping quality was found. After mapping the chimeric reads, we clustered those chimeras for which the coordinates of both mapped fragments overlapped, independently of the order of fragments in the chimera. We first performed the clustering for each of the six experiments independently, and then clustered the six experiments together. Finally we adjusted the transcript regions found in clustered chimeras by extending the miRNA fragment to the full length mature miRNA, and by adding 25 nucleotides downstream sequence to each mRNA fragment. We refer to the mRNA fragments adjusted in this way as the “CLASH targets,” and to the miRNA-mRNA fragment pairs as “CLASH interactions.” Identification of AGO1-Binding Site Clusters from Nonchimeric Reads We identified clusters of non-chimeric reads using a method similar to that previously described (Chi et al., 2009). First, we randomly distributed all distinct reads mapped to each transcript along the same transcript using BEDTools (Quinlan and Hall, 2010), to calculate the maximum random cluster height for each transcript. We then repeated the procedure a hundred times, and retained the maximum cluster height across the repeats. This value was used as a transcript-specific cluster height threshold for the identification of actual clusters. High-confidence clusters were identified as regions of transcripts in which the coverage equaled or exceeded the threshold, the coverage was higher than twenty reads, and the region of high coverage was at least twenty-nucleotide wide. We then calculated high-confidence cluster peaks as the area of twenty nucleotides on each side of the position of highest coverage within the cluster. Overlap between CLASH Targets and Experimentally Validated miRNA Targets from Published Databases To check if our CLASH data set contains known, experimentally validated miRNA targets we compared it with two databases – miRTarBase (Hsu et al., 2011) and TarBase (Vergoulis et al., 2012). Information in these databases is limited: target and miRNA names are often ambiguous (ex. LIN-28B and Lin28 are both used to identify the same gene), and detailed information about the binding site is often missing. To make the comparison possible we downloaded both databases and modified them as follows: 1) we simplified all the miRNA names to “miR-number” (and let-7); 2) we filtered TarBase to only retain human genes with an ENSEMBL gene ID (ENSG) represented in our custom database; 3) we translated gene names in miRTarBase to ENSEMBL gene IDs using the “Gene ID Conversion” tool from Database for Annotation, Visualization and Integrated Discovery (DAVID) (Dennis et al., 2003) and only genes represented in our custom database were retained; 4) we removed redundant interactions within each database. As a consequence each miRNA-mRNA interaction was represented in a simple “miRNA – ENSG” fashion. Overlap between CLASH Targets, AGO-Binding Sites, and Bioinformatically Predicted miRNA Targets The number of CLASH targets that overlapped with AGO1-binding sites obtained in the present study was estimated using BEDTools (Quinlan and Hall, 2010) intersect, using either the positions of all non-chimeric reads mapped to RNA, or the positions of high-confidence cluster peaks calculated as described above. To calculate the overlap between CLASH targets and high-confidence AGO-binding sites obtained in the PAR-CLIP study we first mapped the sequences of the 17,319 clusters from Table S4 in (Hafner et al., 2010a) to our transcriptome database. 15823 clusters were successfully mapped. We then used BEDTools to intersect the positions of CLASH targets and AGO binding clusters. To calculate the degree of overlap between CLASH targets and AGO-binding clusters that would be expected by chance, we randomly placed AGO-binding clusters on: (1) the same transcript in which each cluster was found; (2) a set of random transcripts with a distribution of expression levels matching the transcripts with AGO clusters; or (3) a set of transcripts randomly selected from the human transcriptome. The expression levels of human transcripts in 293 cells were obtained from the microarray data in (Hafner et al., 2010a). We then calculated the number of CLASH targets that overlapped the randomly placed clusters using BEDTools. To calculate the overlap between CLASH targets and bioinformatically predicted miRNA targets by miRanda (John et al., 2004), TargetScan (Lewis et al., 2005), PITA (Kertesz et al., 2007), PicTar (Krek et al., 2005) and RNAhybrid (Rehmsmeier et al., 2004), we extracted the coordinates of predicted miRNA target sites in the hg18 genome from the Functional RNA Project website (www.ncrna.org). We converted CLASH targets from transcriptome to hg19 coordinates using a custom script based on the ENSEMBL Perl API (available on demand), then to hg18 coordinates using liftOver (Kent et al., 2002), and retained the targets mapped to a single genomic position that corresponded to a 3′UTR of a RefSeq gene, only considering those 3′UTR fragments that did not overlap with a CDS of a RefSeq gene on either strand. To generate the control interaction data set, we randomly distributed CLASH targets in the 3′UTR of RefSeq genes (excluding those fragments overlapping with a CDS). We then used BEDTools to overlap CLASH or control interactions with bioinformatically predicted targets. The enrichment was calculated as N_CLASH/N_control, where N_CLASH is the number of CLASH interactions that matched bioinformatic predictions, and N_control is the number of control interactions that matched bioinformatic predictions. To call a match we required that the CLASH interaction and the prediction share both (1) the miRNA name (2) the target position in the mRNA. To decide whether the CLASH targets and predictions refer to the same or different miRNA, we converted miRNA names to a simplified miR-number format, using the regular expression /[a-zA-Z]∗-[0-9]∗/. Analysis of Base Pairing in miRNA-Target Interactions To calculate free energies of binding and determine base pairing between miRNA and target fragments of chimeric reads we used programs from the UNAFold suite (Markham and Zuker, 2008). First, chimeric reads were divided into miRNA and target fragments based on the BLAST analysis. If the miRNA sequence was trimmed then it was replaced with the full-length miRNA sequence from miRBase (Griffiths-Jones et al., 2008). Target fragments were extended by 25 nucleotides at the 3′ end in case the length of the sequencing reads (which ranged from 50 to 100 nts) was too short to include the miRNA-binding site. Then we used the hybrid-min program with default parameters to calculate minimum hybridization energy of the two fragments of chimera that represented interacting RNA molecules. To convert the numeric representation into a Vienna-style dot-bracket format we used a modified version of M. Zuker’s Ct2b.pl script. Evolutionary Conservation of CLASH Targets To analyze evolutionary conservation of CLASH targets, we used per-nucleotide conservation scores among 46 vertebrate genomes calculated using the PhyloP algorithm (Pollard et al., 2010), and downloaded from the UCSC genome browser (Fujita et al., 2011). We selected CLASH targets that were located in 3′UTRs of RefSeq genes, with at least 100 nt of 3′UTR sequence on each side. The CLASH targets were centered at the 5′ end of the longest predicted miRNA-mRNA stem in each target. To calculate the average profile of evolutionary conservation around CLASH targets, the PhyloP score was calculated in the region from −100 to +100 nt from the center, averaged across all targets, and normalized to the number of targets for which the PhyloP score was available. The conservation scores for individual CLASH targets were defined as (C_CLASH - C_control), where C_CLASH is the mean PhyloP score within the longest predicted miRNA-mRNA stem in each target, and C_control is the mean PhyloP score in regions spanning nt −50 through −10 and +10 through +50, relative to the ends of the stem. We used a paired t test to compare the mean conservation within and outside the predicted stems. Analysis of CLASH Target Downregulation after Inhibition of 25 miRNAs To analyze the effect of miRNA inhibition on mRNA transcript levels in specific sets of transcripts, we used the miRNA depletion data from (Hafner et al., 2010a). In this study, 25 miRNAs were depleted using an antisense 2′-O-methyl oligonucleotide cocktail, and mRNA levels were measured by the Human Genome U133 Plus 2.0 Array from Affymetrix. We averaged the log2 GC-RMA signal for all probes matching each transcript, discarding probes matching multiple genes. Probes were assigned to transcripts using data downloaded from ENSEMBL. We then calculated the log2-enrichment of each transcript upon miRNA depletion, by subtracting the average signal in mock-transfected cells from the average signal in 2′-O-methyl oligonucleotide-transfected cells. To calculate the background distribution of mRNA enrichment upon depletion of 25 miRNAs, we randomly selected one transcript for each gene, and then selected a subset of transcripts with expression levels matching those of transcripts in which CLASH targets were identified. We plotted the cumulative distribution of log2-enrichment for these transcripts. As a positive control we used the 331 experimentally validated targets of the 25 miRNAs for which depletion data was available, downloaded from the miRTarBase on August 1, 2011. We then plotted cumulative distributions of log2-enrichment scores after miRNA depletion, for the 1,995 transcripts in which CLASH targets for at least one of the 25 miRNAs were found, or for subsets filtered by: seed complementarity; location of the CLASH target in 5′ UTR, coding sequence, or 3′ UTR; predicted basepairing energy between miRNA and its target; presence of non-chimeric read clusters overlapping with the mRNA fragment of each chimera. Analysis of CLASH Target Downregulation after miR-92a Inhibition Data acquisition and normalization from Affymetrix GeneChip Human Exon 1.0 ST arrays were done by Source BioScience. We averaged the log2 RMA signal for all probes matching each transcript, discarding probes matching multiple genes. We only accepted probes for which expression was detected at p < 0.05 in at least 4 experiments. Probes were assigned to transcripts using data downloaded from ENSEMBL. We then calculated the log2-enrichment of each transcript upon miRNA depletion, by subtracting the average signal in mock-transfected cells from the average signal in miR-92a inhibitor-transfected cells. Wilcoxon tests were used to check whether the log2-enrichment differs between sets of genes. Identification of Base-Pairing Classes by K-Means Clustering We extracted the miRNA basepairing pattern for each CLASH interaction using a modified version of the Ct2B.pl script written by M. Zuker. We then converted the dot and bracket strings into a numeric format by replacing dots with zeroes, and brackets with a number F that represented the minimum energy of each interaction, rescaled from 0 to 5. This was calculated as F = 5∗(dG-dGlow)/(dGhigh-dGlow), where dG is the minimum energy of the interaction as calculated by hybrid-min, dGlow was set to −11 kcal/mol, and dGhigh to −16 kcal/mol. F was capped at 0 at the bottom and 5 at the top. We next performed K-means clustering on the matrix of 18,514 interactions and 22 miRNA positions using Gene Cluster 3.0 (downloaded from http://bonsai.hgc.jp/∼mdehoon/software/cluster/software.htm), to group interactions with similar basepairing patterns. We performed 50 runs of K-means clustering with K set to 4, 5, or 6, using Euclidean distance, and we selected the grouping into 5 clusters for presentation, as it yielded a clear separation into well-defined classes of comparable size. We used two independent approaches to analyze the clustering patterns expected by chance. First, we randomly reassigned (shuffled) miRNA-mRNA pairs found in chimeras, before performing the folding prediction and clustering. This overestimated the number of interactions expected among random miRNA and mRNA fragments because many targets of abundant miRNAs were assigned by chance to the same miRNA. As a second approach, we randomly permuted (scrambled) the target sequence found in each interaction, while conserving the nucleotide content of the targets. Again, this approach is conservative because keeping the nucleotide content will act to increase apparent interaction strengths. Nevertheless, the basepairing patterns observed in the two randomized sets were very different from the one seen for CLASH interactions. The basepairing patterns were represented graphically as heatmaps using Java TreeView (Saldanha, 2004), using contrast = 5.0 and a linear color scale. To analyze basepairing patterns for subsets of CLASH interactions, the clustering was performed first for the entire data set and the relevant interactions were extracted next, to preserve the order of clusters. Clustering each subset independently yielded similar results. Identification of Enriched Sequence Motifs in CLASH Targets To find overrepresented motifs in CLASH targets, we first extracted all miRNAs with at least 10 targets found. For each miRNA separately we ran MEME (Bailey and Elkan, 1994), with the settings: -dna -mod zoops -maxw 7 -nmotifs 1, using the default 0-order Markov model based on nucleotide frequencies in the training set as the background model. Changing the maximum motif length to 8 or 9; the maximum number of motifs to 2, or the number of nucleotides by which CLASH targets were bioinformatically extended between 0 and 50 did not change the conclusions. The motifs identified by meme for targets of each miRNA were then aligned to the reverse-complemented miRNA sequence using FIMO (Grant et al., 2011), with the setting–output-pthresh 0.01. We then selected the motifs with FIMO q-value (FDR) < 0.05, and meme Bonferroni-corrected p-value < 0.05, yielding a set of 108 high-confidence motifs. For motifs that could be mapped to the reverse-complemented miRNA sequence in more than one way, the mapping with the most significant q-value was selected. (Details of the MEME analysis available upon request).