13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identification of Differentially Methylated Sites with Weak Methylation Effects

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Deoxyribonucleic acid (DNA) methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect differentially methylated cytosines (DMCs) among treatments. Most statistical methods for DMC detection do not consider the dependency of methylation patterns across the genome, thus possibly inflating type I error. Furthermore, small sample sizes and weak methylation effects among different phenotype categories make it difficult for these statistical methods to accurately detect DMCs. To address these issues, the wavelet-based functional mixed model (WFMM) was introduced to detect DMCs. To further examine the performance of WFMM in detecting weak differential methylation events, we used both simulated and empirical data and compare WFMM performance to a popular DMC detection tool methylKit. Analyses of simulated data that replicated the effects of the herbicide glyphosate on DNA methylation in Arabidopsis thaliana show that WFMM results in higher sensitivity and specificity in detecting DMCs compared to methylKit, especially when the methylation differences among phenotype groups are small. Moreover, the performance of WFMM is robust with respect to small sample sizes, making it particularly attractive considering the current high costs of bisulfite sequencing. Analysis of empirical Arabidopsis thaliana data under varying glyphosate dosages, and the analysis of monozygotic (MZ) twins who have different pain sensitivities—both datasets have weak methylation effects of <1%—show that WFMM can identify more relevant DMCs related to the phenotype of interest than methylKit. Differentially methylated regions (DMRs) are genomic regions with different DNA methylation status across biological samples. DMRs and DMCs are essentially the same concepts, with the only difference being how methylation information across the genome is summarized. If methylation levels are determined by grouping neighboring cytosine sites, then they are DMRs; if methylation levels are calculated based on single cytosines, they are DMCs.

          Related collections

          Most cited references6

          • Record: found
          • Abstract: found
          • Article: not found

          DNA methylation and epigenetic inheritance in plants and filamentous fungi.

          Plants and filamentous fungi share with mammals enzymes responsible for DNA methylation. In these organisms, DNA methylation is associated with gene silencing and transposon control. However, plants and fungi differ from mammals in the genomic distribution, sequence specificity, and heritability of methylation. We consider the role that transposons play in establishing methylation patterns and the epigenetic consequences of their perturbation.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Statistical methods for detecting differentially methylated loci and regions

            DNA methylation, the reversible addition of methyl groups at CpG dinucleotides, represents an important regulatory layer associated with gene expression. Changed methylation status has been noted across diverse pathological states, including cancer. The rapid development and uptake of microarrays and large scale DNA sequencing has prompted an explosion of data analytic methods for processing and discovering changes in DNA methylation across varied data types. In this mini-review, we present a compact and accessible discussion of many of the salient challenges, such as experimental design, statistical methods for differential methylation detection, critical considerations such as cell type composition and the potential confounding that can arise from batch effects. From a statistical perspective, our main interests include the use of empirical Bayes or hierarchical models, which have proved immensely powerful in genomics, and the procedures by which false discovery control is achieved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Differential methylation of the TRPA1 promoter in pain sensitivity

              Acute and chronic pain have an impact on 20% of the population worldwide and represent a significant therapeutic and economic challenge1 2. Sensitivity to pain is a complex phenotype, which reflects the contribution of multiple biological, psychological and environmental risk factors. There is substantial variation in individual pain sensitivity3, only about half of which is explained by genetic effects, based on evidence from animal studies4 5, twin-based heritability estimates6 7 and genetic association analyses8. Multiple experimental approaches have been used to study pain mechanisms9 10. Chronic pain is known to be underpinned by several biological processes that together serve to amplify pain-related signals. The most well characterized of these are sensitization of the nociceptive nerve fibres (that innervate peripheral tissues), sensitization of the spinal circuits relaying signals associated with tissue damage and strong cortical modulation reflecting expectation, mood and past history. Recent evidence points to the involvement of epigenetic mechanisms across these molecular systems, both in the development and maintenance of pain states11 12. DNA methylation is an essential epigenetic mechanism, which in mammals occurs at cytosine residues, predominantly in the context of CpG dinucleotides. DNA methylation is relevant for gene expression regulation, genomic imprinting, development and genomic stability. Emerging epigenome-wide association studies (EWAS) in humans suggest that epigenetic modifications may play a pivotal role in complex traits13. Recent EWAS of DNA methylation differences in disease-discordant monozygotic (MZ) twins have identified potentially differentially methylated regions (DMRs) in several complex diseases, ranging from diabetes to schizophrenia14 15 16. Epigenetic analyses of discordant twins can provide a powerful study design, but the power of this approach has not yet been tested in large samples at high resolution. Here we performed detailed genome-wide analyses of DNA methylation and identified modest but consistent and significant DNA methylation changes associated with pain sensitivity, with implications for epigenetic studies of other complex traits. Results Epigenome-wide association study design We obtained DNA methylation sequencing profiles in a discovery sample of 25 MZ twin pairs (50 MZ twins) who were discordant for heat pain sensitivity (heat pain tolerance >2 °C between twins) determined experimentally using quantitative sensory testing (QST) (Supplementary Fig. S1). QST is commonly used for the assessment of pain sensitivity and the heat pain tolerance response (measured using the heat pain suprathreshold (HPST)) has been shown to be a clinically relevant QST measure17. We then obtained DNA methylation levels in a second independent sample of 50 unrelated individuals who did not overlap with the individuals in the discovery sample of MZ twins (Supplementary Fig. S1). Whole-blood DNA methylation patterns were assayed using methylated DNA immunoprecipitation followed by deep sequencing (MeDIP-seq), performed separately in the MZ twin sample and in the sample of unrelated individuals, resulting in an average of 50 million paired-end reads per individual in the MZ twin sample and an average of 25 million single-end reads per individual in the unrelated sample. We quantified MeDIP-seq reads to obtain a score of DNA methylation in overlapping 1 kb bins across the genome (Supplementary Figs S2 and S3). To identify DMRs associated with heat pain sensitivity (pain DMRs) we performed two separate pain-sensitivity EWAS on the two cohorts. To combine the results from the MZ twin and unrelated individual cohorts, we pursued a meta-analysis (MA) approach. MA is a powerful tool for aggregating information from multiple independent studies18 and is routinely used for pooling results from genome-wide association studies19, where it has successfully identified many disease-associated genetic variants with small effect sizes, not revealed in the individual studies20 21 22. We thereby obtained MA pain DMRs (MAP-DMRs) that were strongly supported by both cohorts and further investigated. Genome-wide patterns of DNA methylation in MZ twins The DNA methylation sequencing data allowed us to assess at high resolution the distribution of DNA methylation patterns across the genome. We observed greater correlation in DNA methylation profiles within MZ pairs compared with pairs of unrelated individuals, as expected and consistent with previous studies23 (Fig. 1a–c). At a few genomic regions, the correlation pattern was reversed (Fig. 1c), potentially highlighting regions of high DNA methylation turnover. However, at the majority of loci across the genome DNA methylation patterns were positively correlated within MZ twins. The intraclass correlation coefficient within MZ twins across the genome was estimated at 0.064 per 1 kb bin (Fig. 1b), but the estimates showed variability across different regions and at different resolutions (Fig. 1a). We assessed evidence for allele-specific methylation (ASM) in the discovery set of 50 MZ twins and observed ASM (Methods) at 37.8% of tested autosomal heterozygous variants within an individual (considering only variants that were heterozygous in at least 50% of the sample). Overall, 106 heterozygous variants had evidence for ASM in at least 50% of the sample and >99% of these variants showed concordance for ASM of the same allele (or ASM skew) within MZ twin pairs (Fig. 1a). As expected, DNA methylation levels were also strongly correlated with genomic location and with previously annotated epigenetic variants and regulatory regions (Fig. 1d and Supplementary Fig. S4), where CpG islands and promoters had significantly reduced levels of DNA methylation. Pain-sensitivity epigenome-wide analyses We tested for association between DNA methylation variation and pain sensitivity to identify DMRs related to pain (pain DMRs). We first performed pain-sensitivity EWAS in the discovery (MZ twin) and second (unrelated individuals) cohorts separately. In the discovery sample of 50 twins, we compared DNA methylation scores in 1 kb bins across the genome to pain scores for each individual, using a linear mixed-effects regression (LMER) model controlling for family structure (Supplementary Fig. S5 and Supplementary Table S1). At a permutation-based false discovery rate (FDR) of 5% (nominal LMER P=8.4 × 10−9), there was one significant pain DMR in an intergenic region on chromosome 3 (LMER P=8.4 × 10−9). Among the remaining top-ranked pain DMRs was MVP (ranked fourth; LMER P=7.9 × 10−8), which is involved in the interferon-γ signalling pathway24, previously implicated in neuropathic pain25. We also estimated pain DMRs within specific genomic annotation categories (CpG islands, CpG-island shores, and promoters) and found that the top-ranked regions included the pain gene TRPA1 (ranked fourth, CpG-island shores, LMER P=3.9 × 10−5) and other potential pain candidates (BAIAP2 (ranked first, CpG islands, LMER P=4.9 × 10−5) and GRIN1 (ranked ninth, CpG islands, LMER P=2.9 × 10−4); Supplementary Methods). Among the 100 top-ranked regions in the CpG-island shore analysis were two additional known pain genes: the heat pain sensitivity gene TRPV1 (ranked 31st, CpG-island shores, LMER P=2.3 × 10−4) and TRPV3 (ranked 87th, CpG-island shores, LMER P=6.6 × 10−4), both within the same family of transient receptor potential channels as TRPA1. We then performed a pain-sensitivity EWAS in the second sample of 50 unrelated individuals and identified 22 pain DMRs at a permutation-based FDR of 5% (LMER P=2.5 × 10−7; Supplementary Fig. S6 and Supplementary Table S2). The most associated pain DMR in the unrelated samples was in ANK3 (LMER P=1.9 × 10−8), which is required for normal clustering of voltage-gated sodium channels and normal action potential firing in neurons26, and is a susceptibility locus for bipolar disorder27. Pain DMRs in the second cohort also included DCLK1 (ranked ninth, LMER P=1.2 × 10−7), which is a candidate gene for inflammatory nociception in mice28. To explore potential structure and confounders in the second cohort of unrelated individuals, we tested whether including multiple principal components altered the EWAS findings. We found that including the first up to third principal components as covariates did not change the top 22 association signals (Supplementary Fig. S6 and Supplementary Table S3). We explored the sensitivity of EWAS results to potential confounders. We tested the influence of methylation normalization and the inclusion of covariates such as age, but did not find significant alterations in the top-ranked results (Supplementary Methods). To assess the sensitivity of pain DMRs to methylation quantification of MeDIP-seq data, we also performed analyses controlling for CpG density29 and MeDIP-seq fragment size in the discovery sample (Supplementary Figs S7 and S8), and found that the majority of top-ranked pain DMRs remained nominally significant, although their relative rank could change. For example, the TRPA1 pain DMR (originally ranked 50th, LMER P=2.6 × 10−6) was nominally significant when controlling for MeDIP-seq fragment size (ranked 49th, LMER P=2.6 × 10−6) and CpG density (ranked 36,921st (in top 0.07%), LMER P=7.8 × 10−3). To validate DNA methylation signals based on high-resolution MeDIP-seq, we obtained DNA methylation profiles measured on the Illumina Infinium 450k BeadChip assay30 in 44 individuals from the discovery sample. We compared genome-wide DNA methylation obtained from MeDIP-seq and 450k platforms in 1-kb regions genome wide and observed positive correlations (mean linear correlation=0.62) within individuals at 422,158 one-kilobase regions genome wide (Fig. 1e). The comparisons were performed using mean level of DNA methylation in 1-kb regions where data were available on both platforms (which included only 10% of regions assayed by MeDIP-seq and over 99.9% of CpG sites on the Illumina 450k array). We also performed EWAS for pain sensitivity using the Illumina 450k DNA methylation data, but the results did not surpass permutation-based FDR 5% (Supplementary Fig. S9). However, most MeDIP-seq pain DMRs were not represented on the 450k array and only two of the top ten discovery pain DMRs overlapped a 450k CpG site. Pain sensitivity MA We combined the EWAS results from the two cohorts using a fixed-effect epigenome-wide association MA in the combined sample of 100 individuals (Fig. 2a). There were nine MAP-DMRs with compelling evidence for association at a permutation-based MA threshold of FDR=5% (MA P=2.0 × 10−11), excluding results with evidence for heterogeneity. The nine MAP-DMRs mapped to eight unique regions in the TRPA1 promoter (MA P=1.2 × 10−13), in an intergenic region on chromosome 18 (MA P=4.1 × 10−13), 17 kb from the OR8B8 transcription start site (TSS) (MA P=3.4 × 10−12), 47–48 kb from the ST6GALNAC3 TSS (MA P=4.4 × 10−12), in an intron of MTMR12 (MA P=5.1 × 10−12), in an intergenic region on chromosome 4 (MA P=5.6 × 10−12), 33 kb from the MICAL2 TSS (MA P=7.2 × 10−12) and 18 kb from the NFU1 TSS (MA P=1.2 × 10−11; Table 1). In the majority of regions, DNA methylation levels were greater in individuals with low pain scores (hypermethylated MAP-DMRs). We assessed repeat content at MAP-DMRs (Table 1) and identified three MAP-DMRs (in MTMR12, NFU1 and on chromosome 4) in high repeat content regions (>50% repeats). In addition, the second and third top-ranked MAP-DMRs (on chromosome 18 and in OR8B8) had low repeat content ( base addition and adaptor ligation steps were performed using Illumina’s Paired-End DNA Sample Prep kit following the manufacturer’s instructions. Next, the methylated DNA fractions were immunoprecipitated using the Magnetic Methylated DNA Immunoprecipitation Kit (Diagenode, catalogue number: mc-magme-048) with small adjustments. Briefly, 1.5 μl of each two control templates (methylated and unmethylated DNA controls) with different sequences were mixed with adapter-ligated DNA and heat denatured (95 °C, 3 min) in a final reaction volume of 90 μl within the provided reagents of 24 μl MagBuffer A and 6 μl MagBuffer B. Aliquot (7.5 μl) of the denatured genomic DNA was saved as input and another 75 μl was immunoprecipitated with 5 μl of prepared anti-5mC antibody and 20 μl of prepared Magbeads. The DNA–antibody magbeads mixture was incubated overnight at 4 °C on a rotating wheel. The DNA–antibody magbeads mixture was then washed twice using each of 150 μl ice-cold MagWash Buffer-1 and washed once with 150 μl of ice-cold MagWash Buffer-2. For each buffer, the washing reaction was incubated for 4 min at 4 °C on a rotating wheel. The input DNA and immunoprecipitated products were then in parallel treated with protease K and purified with ZYMO DNA Clean & Concentrator-5 (ZYMO). The efficiency and sensitivity of immunoprecipitation reaction were detected by qPCR using the enriched DNA and the unbound input DNA. MeDIP-seq library was constructed by PCR amplification with the enriched DNA as template. PCR reaction was performed in 50 μl of reaction volume consisting of 20 μl MeDIP-enriched DNA, 5 μl of 2.5 mM dNTP, 2 μl primers, 5 μl of 10 × pfx amplification buffer, 2 μl of 50 mM MgSO4, 15.2 μl sterilized water and 0.8 μl of Platinum pfx DNA polymerase (Invitrogen). The programme of amplification was 94 °C 2 min, 10 cycles of 94 °C 15 s, 62 °C 30 s and 72 °C 30 s, then prolong with 10 min at 72 °C. The products could be kept at 12 °C. The PCR products were purified using Agencourt Ampure Beads (Beckman Coulter). After analysing by the Bioanalyzer analysis system (Agilent) and quantified by the real time PCR, the prepared library was sequenced using Illumina HiSeq2000 with SE50 read length. MeDIP-seq DNA methylation quantification In the discovery MZ twin sample, we obtained on average 50 million paired-end reads of length 50 bp per individual. Alignment to the human genome (hg18) was performed using Maq54 (v0.7.1). QC of the aligned data took into account base quality scores (no exclusions), read mapping scores (threshold maq q>10), removal of duplicate paired-end (PE) reads, proper PE pairing criteria and insert-size distribution checks (see Supplementary Fig. S2 and Supplementary Methods). On average, 60% of reads passed alignment QC per individual. We used the post-QC PE reads to reconstruct MeDIP fragments per individual. We quantified coverage of MeDIP fragments per base pair across the genome and binned coverage into overlapping 1-kb bins (overlap of 500 bp). For each bin, we calculated a normalized methylation score, the relative methylation quantification (RMQ) score, which was the sum of the MeDIP-fragment coverage of each base pair in that bin, divided by the overall number of autosomal post-QC reads per individual. For pain-DMR analysis, we only considered bins with RMQ variance >0, where at least 10% of the sample had RMQ scores >0. There were 5,260,672 autosomal 1-kb bins used in the discovery pain-DMR analyses. In the discovery sample, we also performed additional quantifications of MeDIP-seq data, taking into account variation in the size distribution of MeDIP fragments across individuals and variation in CpG density across genomic regions (Supplementary Methods and Supplementary Figs S6 and S7). For fragment analyses, we matched the distribution of MeDIP fragments across the 50 individuals to match a standardized distribution (Supplementary Methods), and in the analyses controlling for CpG density, we used MEDIPS29 to calculate absolute methylation scores (AMS). We used AMS to examine DNA methylation patterns across the genome (Fig. 1), and we used AMS and fragment-corrected scores to assess the reproducibility of pain DMRs to different MeDIP-seq quantifications (Supplementary Figs S7 and S8). In the unrelated samples, we obtained on average 25 million single-end reads of length 50 bp per individual. Sequence alignment to the human genome (hg18) was performed in Bwa55 (v0.5.9; Supplementary Fig. S3). Data were subject to QC checks for base composition (no exclusions), read mapping quality (threshold bwa q>10) and removal of duplicate reads. On average, 53% of reads passed alignment QC per individual. Post-QC single-end reads were then extended by 175 bp equilaterally to produce MeDIP fragments of size 225 bp. We quantified coverage of MeDIP fragments per base pair across the genome, and binned coverage per base pair into overlapping 1-kb bins (overlap of 500 bp). For each bin, we calculated the normalized methylation score (RMQ), and in the pain-DMR analyses we only considered bins with RMQ variance >0, where at least 10% of the sample had RMQ scores >0. There were 5,312,714 autosomal 1-kb bins used in the follow-up pain-DMR analyses. DMR analyses Pain-DMR analyses in the discovery MZ twin set were performed using linear mixed-effects models in R (lme4 package). We regressed the RMQ scores in each 1-kb bin on HPST as a fixed effect, family as a random effect, and with and without age as fixed-effect covariate. Before performing the pain-DMR regression, we normalized the RMQ scores to N(0,1), but results were also calculated using the raw RMQ scores (Supplementary Methods). In the discovery set, additional genome-wide pain-DMR analyses also took into account MeDIP-seq fragment variability and CpG density, and only considered specific genomic categories such as CpG-islands, CpG-island shores and promoters (Supplementary Methods). In the second sample of 50 unrelated individuals, we fit linear models regressing RMQ on HPST across the genome, with or without covariates such as age and the first three autosomal methylation principal components, as well as including RMQ normalization (to N(0,1)). The final pain-DMR results (Table 1) include normalized (to N(0,1)) RMQ scores regressed on HPST. MA of the MZ twin and unrelated samples was performed in GWAMA56, using a fixed-effect MA on the HPST pain-DMR regression betas. We report MAP-DMRs at permutation-based MA significance threshold of FDR=5%. We investigated evidence for heterogeneity in the MA using Cochran’s Q-statistic and the I 2-statistic57, and only considered results with no strong evidence for heterogeneity (Cochran’s Q, P>0.01 and I 2 0.75. Altogether, potential ASM calls were obtained at 17,261 variants. We excluded all ASM variants that fell in genomic regions reported to be problematic in alignment (Supplementary Methods). Blood–brain DNA methylation comparison MeDIP-seq patterns from blood and multiple brain tissues were obtained from two female donors37. The brain regions included the inferior frontal gyrus, middle frontal gyrus, left frontal gyrus, entorhinal cortex, superior temporal gyrus of the temporal cortex, visual cortex and the cerebellum. To assess tissue specificity of methylation effects at the 100 top-ranked MAP-DMRs, we first estimated the correlation between brain and blood methylation levels, by using the mean methylation levels across both individuals in all brain regions and in blood. We then considered regions to have tissue-shared methylation if the methylation difference between blood and one brain region was <10% of the overall blood methylation level, and under more stringent criteria, if the mean difference between blood and all brain region was <10% of the overall blood methylation level (see Supplementary Methods). Gene expression Gene expression data were obtained from skin samples in 341 twins from the MuTHER study35, which included MZ and dizygotic twin pairs and unrelated individuals. Gene expression levels were measured using the Illumina expression array HumanHT-12 version 3 as previously described35. Each sample had three technical replicates and log2-transformed expression signals were quantile normalized first across three replicates of each individual, and second by quantile normalization across all individuals. We used the transformed normalized residuals of the log-transformed gene expression array signal in downstream analysis. Author contributions T.D.S., S.J., J.T.B. and W.J. designed the study. L.M.B., G.F., J.S., H.W. and N.L. performed the experiments. K.W., J.H., S.S., M.D., L.C.S., J.M., F.M.K.W., P.D., S.B. and S.M. contributed reagents, materials and analysis tools. J.T.B. analysed the data with contributions from B.Z., A.K.L. and C.L.H. J.T.B. and T.D.S. wrote the paper. All authors read and approved the manuscript before submission. T.D.S., S.J. and W.J. contributed equally to the study. Additional information Accession codes: The sequence and microarray data have been deposited in the NCBI Gene Expression Omnibus under GEO accession number GSE53130. All data sets are also available at http://www.twinsuk.ac.uk/data-access/medipseq-data/. How to cite this article: Bell, J. T. et al. Differential methylation of the TRPA1 promoter in pain sensitivity. Nat. Commun. 5:2978 doi: 10.1038/ncomms3978 (2014). Supplementary Material Supplementary Information Supplementary Figures S1-S13, Supplementary Tables S1-S8, Supplementary Methods and Supplementary References
                Bookmark

                Author and article information

                Journal
                Genes (Basel)
                Genes (Basel)
                genes
                Genes
                MDPI
                2073-4425
                08 February 2018
                February 2018
                : 9
                : 2
                : 75
                Affiliations
                [1 ]Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA; hongt1@ 123456vt.edu
                [2 ]Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA; hongxiao@ 123456vt.edu (H.Z.); xwwu@ 123456vt.edu (X.W.)
                [3 ]Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA; gunjunekim@ 123456gmail.com (G.K.); hlarose@ 123456vt.edu (H.L.); dhaak@ 123456vt.edu (D.C.H.); saskew@ 123456vt.edu (S.D.A.); jnbarney@ 123456vt.edu (J.N.B.); westwood@ 123456vt.edu (J.H.W.)
                [4 ]Genetic Improvement of Fruits and Vegetables Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA; thechrisclarke@ 123456gmail.com
                Author notes
                [* ]Correspondence: lqzhang@ 123456cs.vt.edu
                Author information
                https://orcid.org/0000-0002-9515-9502
                https://orcid.org/0000-0003-2949-5003
                Article
                genes-09-00075
                10.3390/genes9020075
                5852571
                29419727
                5fe47b21-9f50-44a6-b072-aa551aca421d
                © 2018 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 02 December 2017
                : 25 January 2018
                Categories
                Article

                differentially methylated regions,wavelet-based functional mixed model,weak methylation effect

                Comments

                Comment on this article