Yu Zhang a , b , c , C. Jake Harris d , Qikun Liu e , f , d , Wanlu Liu d , Israel Ausin e , Yanping Long a , b , c , Lidan Xiao a , b , c , Li Feng a , Xu Chen a , Yubin Xie d , Xinyuan Chen d , Lingyu Zhan d , Suhua Feng d , Jingyi Jessica Li g , Haifeng Wang e , h , 2 , Jixian Zhai a , 2 , Steven E. Jacobsen d , i , 2
16 January 2018
In plants, DNA cytosine methylation plays a central role in diverse cellular functions, from transcriptional regulation to maintenance of genome integrity. Vast numbers of whole-genome bisulphite sequencing (WGBS) datasets have been generated to profile DNA methylation at single-nucleotide resolution, yet computational analyses vary widely among research groups, making it difficult to cross-compare findings. Here we reprocessed hundreds of publicly available Arabidopsis WGBS libraries using a uniform pipeline. We identified high-confidence differentially methylated regions and compared libraries using a hierarchical framework, allowing us to identify relationships between methylation pathways. Furthermore, by using a large number of independent wild-type controls, we effectively filtered out spontaneous methylation changes from those that are biologically meaningful.
Genome-wide characterization by next-generation sequencing has greatly improved our understanding of the landscape of epigenetic modifications. Since 2008, whole-genome bisulfite sequencing (WGBS) has become the gold standard for DNA methylation analysis, and a tremendous amount of WGBS data has been generated by the research community. However, the systematic comparison of DNA methylation profiles to identify regulatory mechanisms has yet to be fully explored. Here we reprocessed the raw data of over 500 publicly available Arabidopsis WGBS libraries from various mutant backgrounds, tissue types, and stress treatments and also filtered them based on sequencing depth and efficiency of bisulfite conversion. This enabled us to identify high-confidence differentially methylated regions (hcDMRs) by comparing each test library to over 50 high-quality wild-type controls. We developed statistical and quantitative measurements to analyze the overlapping of DMRs and to cluster libraries based on their effect on DNA methylation. In addition to confirming existing relationships, we revealed unanticipated connections between well-known genes. For instance, MET1 and CMT3 were found to be required for the maintenance of asymmetric CHH methylation at nonoverlapping regions of CMT2 targeted heterochromatin. Our comparative methylome approach has established a framework for extracting biological insights via large-scale comparison of methylomes and can also be adopted for other genomics datasets.