Introduction Liver cancer accounts for 9% of all cancer deaths worldwide and 12% in developing countries (Jemal et al., 2011). Pathological inspection indicates hepatocellular carcinoma (HCC) in ∼80% of liver tumors, with infection by hepatitis B virus (HBV) and hepatitis C virus (HCV) being the most prevalent risk factors, followed by chronic alcoholism (Jemal et al., 2011; Perz et al., 2006; Tateishi and Omata, 2012). Although early detection and monitoring of patients with liver cirrhosis can substantially improve 5 year survival rates, progression to advanced HCC reduces average life expectancy to less than 8 months (Llovet et al., 2008). As for other cancers, genome and exome resequencing have elucidated molecular pathways frequently perturbed in HCC (Guichard et al., 2012; Tateishi and Omata, 2012; Totoki et al., 2011), potentially enabling therapeutic intervention informed by the mutational signature of a given tumor. The capacity to catalog the full spectrum of genetic aberrations occurring in HCC is therefore of critical importance. LINE-1 (L1) retrotransposons are a major source of endogenous mutagenesis in humans (Burns and Boeke, 2012; Levin and Moran, 2011). These mobile genetic elements utilize a “copy-and-paste” mechanism to retrotranspose to new genomic loci, with such success in germ cells that 500,000 L1 copies comprise ∼17% of the genome (Lander et al., 2001). Of these copies, only 80–100 are transposition competent, with distinct subsets of frequently active—or “hot”—L1s driving insertional mutagenesis in each individual genome (Beck et al., 2010; Brouha et al., 2003). Retrotransposon insertions can profoundly alter gene structure and expression (Cordaux and Batzer, 2009; Faulkner et al., 2009; Han et al., 2004; Levin and Moran, 2011) and have been found in nearly 100 cases of disease (Faulkner, 2011; Hancks and Kazazian, 2012). L1 activity is consequently suppressed in most somatic cells by methylation of a CpG island in the internal L1 promoter (Coufal et al., 2009; Swergold, 1990). By contrast, L1 is often hypomethylated in tumor cells, removing a key obstacle to retrotransposition (Levin and Moran, 2011). Despite this failure to repress L1 transcription, only a handful of L1 insertions had been found in human tumors until very recently (Liu et al., 1997; Miki et al., 1992). High-throughput L1 integration site sequencing has since revealed 9 and 69 de novo L1 insertions, respectively, in lung and colorectal tumors (Iskow et al., 2010; Solyom et al., 2012), whereas cancer genome resequencing elucidated a further 183 tumor-specific L1 insertions in colorectal, ovarian, and prostate cancer (Lee et al., 2012). In this latter study, more than half of all insertions were found in a single colorectal tumor; the other individuals presented fewer than five tumor-specific L1 insertions on average. These data suggest L1 mobilization may be common in epithelial tumors, though the reasons for possible cell-of-origin restriction are currently unknown. Tumor-specific L1 retrotransposition has not previously been observed in HCC. For several reasons it is, however, a logical cancer in which to expect L1 mobilization. First, HCC is epithelial in origin. Second, HBV and HCV infection are common in HCC; viruses can suppress host defense factors, such as APOBEC proteins, that control retrotransposon activation. APOBEC3G has been shown, for instance, to inhibit both HBV replication and endogenous retrotransposition (Esnault et al., 2005; Turelli et al., 2004). Third, liver inflammation precedes HCC and may, via cellular stress, stimulate retrotransposition (Fornace and Mitchell, 1986). Given these facts, we aimed to map L1 integration sites in HCC using retrotransposon capture sequencing (RC-seq) and assess their impact upon oncogenic and tumor suppressor pathways. Results Enhanced Retrotransposon Capture Sequencing To test the hypothesis that L1 mobilizes in HCC, we applied an updated RC-seq protocol to 19 HCC tumors and matched adjacent liver tissue that were confirmed positive for HBV or HCV infection (Table 1). An earlier RC-seq design (Baillie et al., 2011) was modified to incorporate multiplex liquid-phase sequence capture (Figure 1A) using a refined probe pool (Table S1 available online) and a reduced insert size of ∼220 nt, which enabled high-confidence assembly of overlapping paired-end 150 nt reads (Figure 1B). This change simplified genomic alignment and, more importantly, enabled single-nucleotide resolution of retrotransposon integration sites (Figure 1C). After stringent filtering and mapping, an average of ∼2 million reads were retained per library with >95% identity to active L1, Alu, and SVA families, as well as the most recently active human LTR endogenous retroviruses (Table S2). Optimized sequence capture led to a 4-fold increase in reads aligned to nonreference genome L1s per library compared to previous RC-seq based on solid-phase arrays and similar sequencing depth (Baillie et al., 2011). The improved resolution of RC-seq also allowed us to discriminate a required minimum of two unique amplicons in support of any nonreference genome insertion (see Extended Experimental Procedures). Frequent Retrotransposition in the Human Germline A total of 7,689 nonreference genome insertions were detected in 19 tumor (T) samples and 19 matched nontumor (NT) liver samples. Of these, we annotated 7,644 as putatively germline (Table S3) because of their presence in (1) databases of retrotransposon-induced polymorphisms (Beck et al., 2010; Ewing and Kazazian, 2010; Iskow et al., 2010; Wang et al., 2006), (2) pre-existing insertions annotated by pooled blood RC-seq (Baillie et al., 2011), (3) multiple individuals, or (4) nontumor liver. L1, Alu, SVA, and LTR-flanked retrotransposons comprised 13.5%, 81.8%, 4.3%, and 0.4% of germline insertions, respectively. As expected, L1-Ta and L1-pre-Ta (99.3%) and AluY (99.7%) were the main L1 and Alu subfamilies active in germ cells (Mills et al., 2007). A total of 2,241 germline insertions were found in only one individual each (Table 1 and Table S3) and were not annotated by the aforementioned retrotransposon polymorphism databases, suggesting that these were private or rare mutations or, alternatively, had occurred in early development (Garcia-Perez et al., 2007; Kano et al., 2009). RC-seq detected 1,489 (66.4%) insertions at both their 5′ and 3′ ends, enabling us to model the characteristic sequence features of L1-mediated retrotransposition. Without any additional sequencing, we were able to analyze insertions for the presence of target site duplications (TSDs), an L1-endonuclease recognition motif (Jurka, 1997), and a polyA tail (Figures 2A and 2B). These features consistently resembled target-primed reverse transcription (TPRT) for L1, Alu, and SVA, again illustrating the primary retrotransposition mechanism in germ cells (Cost et al., 2002; Jurka, 1997). We also identified 160 previously undetected full-length (>99.9%) L1 copies, including 115 with paired 5′/3′ detection (Figure 2C; Table S4) and 82 each found in a single donor only. All were annotated as L1-Ta or pre-Ta. These potentially “hot” L1s added to a recent cohort of full-length L1 insertions found in six geographically diverse individuals via fosmid screening and sequencing (Beck et al., 2010). Of 68 L1 insertions reported by Beck et al. (2010), we detected 49 (72.1%), including 15/18 (83.3%) with an allelic frequency >5%. Of the 49 insertions common to both studies, 46 (93.9%) were base-pair identical in genomic position. These results confirm strong agreement between RC-seq and the conservative fosmid-based approach of Beck et al. (2010). Each individual genome contained on average 244 nonreference genome L1 insertions, a figure 60% and 80% higher, respectively, than recent L1 insertion site sequencing on cell lines (Ewing and Kazazian, 2010) and single cells (Evrony et al., 2012). Therefore, to assess the RC-seq false-positive rate, we randomly selected 200 germline insertions (173 Alu, 14 L1, 11 SVA, and 2 LTR) for site-specific PCR validation (Table S5). Of these, we confirmed 197 (98.5%). The remaining three insertions (2 SVA and 1 Alu) occurred in repetitive genomic regions and were detected by multiple unique reads in at least ten different samples each, indicating that these may have represented PCR false negatives. These comparisons and experiments together demonstrate the sensitive and accurate mapping of bona fide retrotransposition events by RC-seq and further highlight ongoing L1 retrotransposition in the global human population (Beck et al., 2010; Ewing and Kazazian, 2010; Huang et al., 2010; Iskow et al., 2010). Activation of β-Catenin/Wnt Signaling via L1-Mediated Ablation of MCC To assess the potential tumorigenic consequences of the identified nonreference genome insertions, we selected and validated, by insertion site PCR, 31 L1, Alu, and SVA insertions in genes generally implicated to play a causal role in cancer (Futreal et al., 2004) or specifically in HCC (Guichard et al., 2012), including L1 insertions in the proto-oncogene ALK and the tumor suppressor FHIT (Table S5). Quantitative RT-PCR indicated, however, that 28/31 of these germline insertions did not significantly perturb host gene expression in tumor or nontumor liver versus control liver from five unaffected individuals (data not shown). Strikingly, the three remaining insertions all coincided with strong inhibition of the tumor suppressor mutated in colorectal cancers (MCC) (Higgins et al., 2007). MCC is expressed in liver (Senda et al., 1999) and regulates the oncogenic β-catenin/Wnt signaling pathway frequently activated in HCC (Fukuyama et al., 2008; Guichard et al., 2012; Totoki et al., 2011). In vitro experiments have established that siRNA knockdown of MCC mRNA dramatically increases β-catenin (CTNNB1) expression, whereas MCC overexpression inhibits cellular proliferation (Fukuyama et al., 2008; Matsumine et al., 1996). MCC is also an intriguing HCC candidate gene because of its genomic proximity to APC, a major tumor suppressor mutated in familial adenomatous polyposis preceding colorectal cancer (Groden et al., 1991; Kinzler et al., 1991). It is important to note that mutated APC occurs in 60% of colorectal carcinomas (Guichard et al., 2012; Powell et al., 1992). We therefore hypothesized that germline retrotransposition events specifically inhibited MCC tumor suppressor function in liver. To test this prediction, we assessed the impact of each MCC mutation upon MCC, APC, and CTNNB1 expression. Three germline retrotransposon insertions were found in MCC. The first of these, labeled MCC-L1-α, comprised a 5.3 kb L1-Ta oriented in sense to MCC in donors 70 and 95 (Figure 3A). Another L1-Ta, labeled MCC-L1-β, was full-length (6 kb), occurred at a different genomic position in donor 116, and was oriented antisense to MCC (Figure 3B). Finally, in donor 33, we found an AluY (MCC-Alu; Figure 3C) inserted in an ENCODE-delineated enhancer (Thurman et al., 2012). Insertion site PCR revealed that MCC-L1-α was heterozygous in donor 70 and homozygous (or possibly hemizygous) in donor 95, whereas MCC-L1-β and MCC-Alu were heterozygous in donor 116 and donor 33, respectively (Figure 3D). An immunoblot indicated that MCC was dramatically less abundant in tumor and nontumor samples from all four donors compared with control liver tissue (Figure 4A). By contrast, CTNNB1 was expressed much more strongly in the affected donors than in controls (Figure 4A). This inverse relationship was consistent with MCC suppression of CTNNB1 through protein-protein interactions, as reported elsewhere (Fukuyama et al., 2008). As a corroborating example, immunohistochemistry performed on tumor and nontumor tissue from donor 116 confirmed cytoplasmic CTNNB1 accumulation (Figure S1), a strong indicator that the factors controlling CTNNB1 expression outside of the plasma membrane were absent and that many cells had entered a proliferative state (Nhieu et al., 1999). Quantitative RT-PCR indicated that MCC transcription was severely reduced (p G) in donor 33 MCC exon 5, producing an Arg > Lys substitution in the putative CTNNB1 binding domain of MCC (Fukuyama et al., 2008). Therefore, MCC-L1-α, MCC-L1-β, and MCC-Alu were the primary enactors of MCC transcriptional inhibition, potentially assisted by other modifications to MCC or its upstream regulatory pathway. Finally, we performed qRT-PCR to evaluate APC transcription coincident with mutated MCC. We found no significant differential transcription of APC in tumor or nontumor liver from the four affected donors versus normal liver controls (Figure S2). In donor 95, APC was downregulated significantly in tumor versus nontumor (p 1kb in length, or shorter insertions where a filled site was not detected using the standard assay, additional retrotransposon specific primers were designed and paired with the existing insertion site primers. PCR reactions contained 2U MyTaq hot-start DNA polymerase (Bioline #BIO-21112), 1X PCR buffer, 1μM of each primer and 10ng genomic DNA in a 25μL reaction. The following cycling conditions were used: 95°C for 2 min, then 35 cycles of 95°C for 15 s, 60°C for 15 s, 72°C for 1 min, followed by a single extension step at 72°C for 10 min. Optimization in some cases required adjusted annealing temperatures, cycle number, or changing polymerase enzymes. If multiple PCR products similar to the correct size were observed capillary sequencing was used to clarify validation. Tumor-specific insertions detected by RC-seq were assessed using a strategy similar to that used for germline insertions except, in this case, additional primers were generated to characterize insertion 5′ and 3′ ends (Table S6). All products were capillary sequenced using an ABI3730 (GenePool, Edinburgh and AGRF, Brisbane). All primers were designed using custom Python scripts and Primer3 (Rozen and Skaletsky, 2000). Input DNA for all PCR validation reactions, for both germline and tumor-specific insertions, was stored and handled separately to postamplification Illumina libraries. DNA Methylation Analyses We followed previously described protocols to analyze the level of L1 promoter methylation (Coufal et al., 2009; Wissing et al., 2012). Briefly, 2μg of genomic DNA was bisulfite converted using an Epitect Kit (QIAGEN) following manufacturer instructions. After purification, 300-500ng of bisulphite converted genomic DNA was used as template in a PCR reaction with 10Us Taq polymerase (Roche Expand High Fidelity Taq), 0.2 mM dNTPs (Invitrogen), and 200ng of L1_Bis-F and L1_Bis-R primers (see Table S8) in a 50ul reaction. The following cycling conditions were used: 2 min at 95°C, then 35 cycles of 30 s at 94°C, 30 s at 54°C, 60 s at 72°C, followed by a single extension step at 72°C for 5 min. Negative controls were included at each step using RNA/DNA-free water (Invitrogen). PCR fractions were then resolved on agarose gels, the ∼350bp amplification band excised, purified using a QIAquick gel extraction kit (QIAGEN), and cloned into the pGEM-T Easy Vector (Promega). More than 20 independent clones per sample were capillary sequenced using universal primers. Sequences were aligned to a mock bisulphite converted consensus L1-Ta sequence (L1.3, accession L19088.1) using ClustalX (Thompson et al., 1997) and the methylation status of CpG dinucleotides was scored by hand. The 7 sequences with the highest identity compared to the consensus sequence were used to graphically represent the overall level of L1 promoter methylation, as shown in Figure S4C. Chi-square tests were used to calculate the significance of the proportion of methylated and unmethylated CpG dinucleotides in each sample or group. Cell Culture Human hepatocellular cell lines Huh7, HepG2, PLC/PRF/5 and HeP3B were a kind gift from Dr. Bakary Sylla (International Agency for Research on Cancer, Lyon, France). Cells were cultured in DMEM-F12 (1:1) media supplemented with 10% fetal bovine serum, 2 mM Glutamax, 0.5 mM sodium pyruvate and 1% nonessential amino acids at 37°C and 5% CO2. Immunoblot Tissues or cell line pellets were lysed in western lysis buffer containing 50 mM HEPES pH 7.1, 1% Triton X-100, 50 mM NaCl, protease inhibitor cocktail (Roche #11836 153001) and phosphatase inhibitors cocktails (Sigma #P2850, #P5726). Protein was estimated by Bradford method and 30μg of protein extracts were loaded on 7.5% sodium dodecyl sulfate-polyacrylamide gel (SDS-PAGE). After electrophoresis, proteins were transferred to polyvinylidene difluoride membranes (Millipore #IPVH00010). Membranes were blocked with 5% milk and then immunoblotting was done with the required primary antibody (anti-ST18 (Abcam #ab127900, 1:1000), anti-MCC (Santa Cruz #sc-135982,1:500), anti-CTNNB1 (Santa Cruz #sc-7199, 1:1000), anti-GAPDH (Abcam #ab125247,1:5000)) followed by peroxidase-conjugated secondary anti-rabbit (Cell Signaling #7074, 1:5000) or anti-mouse (Cell Signaling #7076, 1:5000) antibody and visualized using an enhanced chemiluminescence detection system (GE Amersham #RPN2132). Immunohistochemistry 4 μm-thick sections of formalin-fixed, paraffin-embedded liver samples were de-waxed in xylene prior to rehydration. Antigen retrieval was performed by boiling slides in 1 mM EDTA pH 8.0 for 20 min (for CTNNB1 staining) and in 10 mM citrate buffer pH 6.0 for 10 min (for ST-18 staining). Sections were incubated for 1 hr at room temperature with mouse CTNNB1 monoclonal antibody (1:300 dilution, clone 17C2, Abcys) or rabbit ST-18 polyclonal antibody (1:200 dilution, Abcam Ab86563). Incubation with primary antibody was followed by incubation with either peroxidase-conjugated donkey anti-mouse or rabbit antibody. Immunoreactive staining was detected using the Dako Envision System HRP (DAKO, CA, USA). Nuclei were counterstained with hematoxylin. Chromatin Immunoprecipitation Huh7 cells were fixed with 1% formaldehyde for 10 min at room temperature, fixing was neutralized by 125 mM glycine and cells were harvested using a cell scraper. 107 cells were used per group and were sonicated in a Covaris S220 at 4°C for 20 min. 10 μg anti-ST18 antibodies (Abcam #ab86563 and #ab127900) were utilized for immunoprecipitation, rabbit IgG (Millipore #12-370) was used as a control. Coimmunoprecipitated chromatin fragments were reverse crosslinked and analyzed for target and nontarget regions using quantitative real-time PCR. Primers are given in Table S8. Expression Analysis Total RNA was treated with a Turbo DNA-free kit (Ambion #AM1906) and reverse transcribed with the SuperscriptIII first-strand synthesis system (Invitrogen #18080-044). For L1 analyses, cDNA synthesis required a specific sense L1 primer. For all other analyses, cDNA was prepared using random hexamers. qRT–PCR was performed using LightCycler 480 SYBR green mix (Roche #04707516001) or LightCycler probe master mix (Roche #04707494001) according to the manufacturer’s instructions (primer sequences are given in Table S8) and run on a LightCycler 480 (Roche). TBP, GAPDH and HPRT were assessed as normalization controls. TBP provided the most precise measurements across individuals and was selected as the control for all qRT-PCR experiments presented here. HBV Detection PCR reactions contained 0.5μL MyTaq DNA polymerase (Bioline), 1X PCR buffer, 1μM of each primer and 100ng genomic DNA in a 50μL reaction volume. The following cycling conditions were used: 94°C for 2 min, then 35 cycles of 94°C for 15 s, 55°C for 15 s, 70°C for 10 s, followed by a single extension step at 70°C for 10 min. Primer sequences are given in Table S8. ST18 Copy Number and Expression Analysis in Mdr2−/− Mice The Mdr2 −/− mouse is an established animal model of inflammation driven HCC (Mauad et al., 1994). A total of 27 nodules representing different time points of tumor progression (Table S7) were collected from 10 Mdr2 −/− mice (7 males + 3 females) sacrificed at 13-16 months, together with matched normal tissue (kidney). Stage of disease and tumor content were assessed through pathological inspection. 4 nodules were found to not include HCC cells. Genomic DNA was extracted from samples using DNeasy Tissue kit (QIAGEN) according to the manufacturer’s protocols. Total DNA concentration and quantity were assessed by measuring absorbance at 260nm with a NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific). ST18 copy number was assessed by quantitative real-time PCR using a TaqMan copy number assay (gene probe: Mm00040629_cn) on a 7900HT Fast Real-Time PCR System (Applied Biosystems) with sequence detection systems software 2.2.2. TERT (Applied Biosystems, part number 4458373) was used as a reference. All samples were plated in quadruplicates with 20ng DNA for each reaction. CNV calling was done with CopyCaller v2.0 (Applied Biosystems) and normalized to kidney. For qRT-PCR, total RNA was extracted from nodules and wild-type mouse liver samples using an RNeasy kit (QIAGEN) according to the manufacturer’s instructions. 500ng total RNA from each sample was then used for cDNA synthesis with ImProm-II Reverse Transcriptase (Promega). 1 μl cDNA from each reaction was used for qRT-PCR using the mouse ST18 primers listed in Table S8. qRT-PCR (SYBR-green) analysis was performed on an Applied Biosystems 7500 Real-time PCR system. Values were normalized to TBP.