3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Annotating functional effects of non-coding variants in neuropsychiatric cell types by deep transfer learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Genomewide association studies (GWAS) have identified a large number of loci associated with neuropsychiatric traits, however, understanding the molecular mechanisms underlying these loci remains difficult. To help prioritize causal variants and interpret their functions, computational methods have been developed to predict regulatory effects of non-coding variants. An emerging approach to variant annotation is deep learning models that predict regulatory functions from DNA sequences alone. While such models have been trained on large publicly available dataset such as ENCODE, neuropsychiatric trait-related cell types are under-represented in these datasets, thus there is an urgent need of better tools and resources to annotate variant functions in such cellular contexts. To fill this gap, we collected a large collection of neurodevelopment-related cell/tissue types, and trained deep Convolutional Neural Networks (ResNet) using such data. Furthermore, our model, called MetaChrom, borrows information from public epigenomic consortium to improve the accuracy via transfer learning. We show that MetaChrom is substantially better in predicting experimentally determined chromatin accessibility variants than popular variant annotation tools such as CADD and delta-SVM. By combining GWAS data with MetaChrom predictions, we prioritized 31 SNPs for Schizophrenia, suggesting potential risk genes and the biological contexts where they act. In summary, MetaChrom provides functional annotations of any DNA variants in the neuro-development context and the general method of MetaChrom can also be extended to other disease-related cell or tissue types.

          Author summary

          A large number of genetic variants have been statistically associated with the risks of common diseases. However, whether such variants are actual risk variants and when and where they function are often unknown. To address this challenge, machine learning methods have been developed to predict functional variants in specific cellular contexts. These methods correlate DNA sequences with their biological functions, e.g. enhancer activities, and can predict effects of single base mutations. Nevertheless, the training data used by existing methods often lack neurodevelopment-related cell types, thus annotating variant effects in neuropsychiatric genetics remains difficult. In this work, we fill this gap by collecting a large set of regulatory genomic datasets from fetal and adult brain, from iPSC-based cellular models and brain organoids. We trained deep learning models on this data, and further improved its performance by borrowing information from large external datasets, a strategy known as transfer learning. Our tool, MetaChrom, is substantially better at predicting experimentally determined regulatory variants than current methods, and helps us identify candidate risk variants of Schizophrenia. We believe MetaChrom provides a valuable tool for the neuropsychiatric genetic community, and the software can be of interest to researchers in other fields as well.

          Related collections

          Most cited references79

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A global reference for human genetic variation

          The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The mutational constraint spectrum quantified from variation in 141,456 humans

            Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              An Integrated Encyclopedia of DNA Elements in the Human Genome

              Summary The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curation
                Role: Data curation
                Role: Data curation
                Role: Data curation
                Role: ConceptualizationRole: Funding acquisitionRole: MethodologyRole: Project administrationRole: ResourcesRole: SupervisionRole: Writing – original draft
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: ResourcesRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput Biol
                plos
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                May 2022
                16 May 2022
                : 18
                : 5
                : e1010011
                Affiliations
                [1 ] Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
                [2 ] Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
                [3 ] Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, United States of America
                [4 ] Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, Illinois, United States of America
                University of California San Francisco, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0002-4201-6786
                https://orcid.org/0000-0003-4337-8532
                https://orcid.org/0000-0001-8490-918X
                https://orcid.org/0000-0002-1646-063X
                https://orcid.org/0000-0002-7298-7460
                https://orcid.org/0000-0002-7215-3220
                https://orcid.org/0000-0001-7111-4839
                Article
                PCOMPBIOL-D-21-01311
                10.1371/journal.pcbi.1010011
                9135341
                35576194
                3bdc9b2c-e169-4914-a0ee-b2b35bce438c
                © 2022 Lai et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 16 July 2021
                : 11 March 2022
                Page count
                Figures: 7, Tables: 0, Pages: 22
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000025, national institute of mental health;
                Award ID: R01MH116281
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000025, national institute of mental health;
                Award ID: R01MH110531
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000057, national institute of general medical sciences;
                Award ID: R01GM089753
                Award Recipient :
                Funded by: University of Chicago Biological Sciences Division
                Award ID: BSD 2021-22
                Award Recipient :
                This research was supported by the National Institutes of Health( https://www.nih.gov/) (R01MH116281, R01MH110531 to X.H. and R01GM089753 to J.X.), and the university of Chicago Biological Sciences Division( https://biologicalsciences.uchicago.edu/) (BSD 2021-22 to S.Q.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Computational Biology
                Epigenomics
                Biology and Life Sciences
                Genetics
                Genomics
                Epigenomics
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromatin
                Biology and Life Sciences
                Genetics
                Epigenetics
                Chromatin
                Biology and Life Sciences
                Genetics
                Gene Expression
                Chromatin
                Biology and Life Sciences
                Cell Biology
                Cellular Types
                Animal Cells
                Neurons
                Biology and Life Sciences
                Neuroscience
                Cellular Neuroscience
                Neurons
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genome-Wide Association Studies
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genome-Wide Association Studies
                Biology and Life Sciences
                Genetics
                Human Genetics
                Genome-Wide Association Studies
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Motif Analysis
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Medicine and Health Sciences
                Epidemiology
                Medical Risk Factors
                Biology and Life Sciences
                Genetics
                Genetic Loci
                Alleles
                Custom metadata
                vor-update-to-uncorrected-proof
                2022-05-26
                The ATAC-seq data of the iPSC derived neurons are publicly available in the Gene Expression Omnibus(GSE129017). The reference epigenomic dataset is available at ( http://deepsea.princeton.edu). Detailed data source for the neurodevelopmental model used in our experiment can be found in the supporting files (Table A in S1 Table).

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article