Integration of retroviral vectors in the human genome follows non random patterns that favor insertional deregulation of gene expression and may cause risks of insertional mutagenesis when used in clinical gene therapy. Understanding how viral vectors integrate into the human genome is a key issue in predicting these risks. We provide a new statistical method to compare retroviral integration patterns. We identified the positions where vectors derived from the Human Immunodeficiency Virus (HIV) and the Moloney Murine Leukemia Virus (MLV) show different integration behaviors in human hematopoietic progenitor cells. Non-parametric density estimation was used to identify candidate comparative hotspots, which were then tested and ranked. We found 100 significative comparative hotspots, distributed throughout the chromosomes. HIV hotspots were wider and contained more genes than MLV ones. A Gene Ontology analysis of HIV targets showed enrichment of genes involved in antigen processing and presentation, reflecting the high HIV integration frequency observed at the MHC locus on chromosome 6. Four histone modifications/variants had a different mean density in comparative hotspots (H2AZ, H3K4me1, H3K4me3, H3K9me1), while gene expression within the comparative hotspots did not differ from background. These findings suggest the existence of epigenetic or nuclear three-dimensional topology contexts guiding retroviral integration to specific chromosome areas.
Understanding how retroviral vectors integrate in the human genome is a major safety issue in gene therapy, since a concrete risk of developing tumors associated with the integration process has been observed in several clinical trials. Statistical analyses confirmed the non randomness of the integration. Where and why do virus-specific integrations tend to accumulate in the genome? We compared integration preferences of two retroviral vectors derived from HIV and MLV, which are used in most gene therapy trials for hematological disorders, in their actual clinical targets, i.e., human hematopoietic stem/progenitor cells. We developed a new statistical method to find areas of the genome, called comparative hotspots, where integration preferences are significantly different. We modeled the integration process as a stochastic process, so that integration sites are seen as samples from an unknown virus-specific probability density function. Thus, the problem became to identify areas where two empirical density functions differ significantly. The comparison of nonparametric variability bands around the estimated integration densities allowed identifying and ranking candidate comparative hotspots. Results indicated clear differential patterns of integration between HIV and MLV, leading to new hypotheses on the mechanisms governing retroviral integration.