Hi-C and capture Hi-C (CHi-C) are used to map physical contacts between chromatin regions in cell nuclei using high-throughput sequencing. Analysis typically proceeds considering the evidence for contacts between each possible pair of fragments independent from other pairs. This can produce long runs of fragments which appear to all make contact with the same baited fragment of interest.
We hypothesised that these long runs could result from a smaller subset of direct contacts and propose a new method, based on a Bayesian sparse variable selection approach, which attempts to fine map these direct contacts. Our model is conceptually novel, exploiting the spatial pattern of counts in CHi-C data. Although we use only the CHi-C count data in fitting the model, we show that the fragments prioritised display biological properties that would be expected of true contacts: for bait fragments corresponding to gene promoters, we identify contact fragments with active chromatin and contacts that correspond to edges found in previously defined enhancer-target networks; conversely, for intergenic bait fragments, we identify contact fragments corresponding to promoters for genes expressed in that cell type. We show that long runs of apparently co-contacting fragments can typically be explained using a subset of direct contacts consisting of <10 % of the number in the full run, suggesting that greater resolution can be extracted from existing datasets.
Our results appear largely complementary to those from a per-fragment analytical approach, suggesting that they provide an additional level of interpretation that may be used to increase resolution for mapping direct contacts in CHi-C experiments.