Genomewide association studies (GWAS) have identified a large number of loci associated with neuropsychiatric traits, however, understanding the molecular mechanisms underlying these loci remains difficult. To help prioritize causal variants and interpret their functions, computational methods have been developed to predict regulatory effects of non-coding variants. An emerging approach to variant annotation is deep learning models that predict regulatory functions from DNA sequences alone. While such models have been trained on large publicly available dataset such as ENCODE, neuropsychiatric trait-related cell types are under-represented in these datasets, thus there is an urgent need of better tools and resources to annotate variant functions in such cellular contexts. To fill this gap, we collected a large collection of neurodevelopment-related cell/tissue types, and trained deep Convolutional Neural Networks (ResNet) using such data. Furthermore, our model, called MetaChrom, borrows information from public epigenomic consortium to improve the accuracy via transfer learning. We show that MetaChrom is substantially better in predicting experimentally determined chromatin accessibility variants than popular variant annotation tools such as CADD and delta-SVM. By combining GWAS data with MetaChrom predictions, we prioritized 31 SNPs for Schizophrenia, suggesting potential risk genes and the biological contexts where they act. In summary, MetaChrom provides functional annotations of any DNA variants in the neuro-development context and the general method of MetaChrom can also be extended to other disease-related cell or tissue types.
A large number of genetic variants have been statistically associated with the risks of common diseases. However, whether such variants are actual risk variants and when and where they function are often unknown. To address this challenge, machine learning methods have been developed to predict functional variants in specific cellular contexts. These methods correlate DNA sequences with their biological functions, e.g. enhancer activities, and can predict effects of single base mutations. Nevertheless, the training data used by existing methods often lack neurodevelopment-related cell types, thus annotating variant effects in neuropsychiatric genetics remains difficult. In this work, we fill this gap by collecting a large set of regulatory genomic datasets from fetal and adult brain, from iPSC-based cellular models and brain organoids. We trained deep learning models on this data, and further improved its performance by borrowing information from large external datasets, a strategy known as transfer learning. Our tool, MetaChrom, is substantially better at predicting experimentally determined regulatory variants than current methods, and helps us identify candidate risk variants of Schizophrenia. We believe MetaChrom provides a valuable tool for the neuropsychiatric genetic community, and the software can be of interest to researchers in other fields as well.