4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      An equivariant Bayesian convolutional network predicts recombination hotspots and accurately resolves binding motifs

      research-article
      ,
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Convolutional neural networks (CNNs) have been tremendously successful in many contexts, particularly where training data are abundant and signal-to-noise ratios are large. However, when predicting noisily observed phenotypes from DNA sequence, each training instance is only weakly informative, and the amount of training data is often fundamentally limited, emphasizing the need for methods that make optimal use of training data and any structure inherent in the process.

          Results

          Here we show how to combine equivariant networks, a general mathematical framework for handling exact symmetries in CNNs, with Bayesian dropout, a version of Monte Carlo dropout suggested by a reinterpretation of dropout as a variational Bayesian approximation, to develop a model that exhibits exact reverse-complement symmetry and is more resistant to overtraining. We find that this model combines improved prediction consistency with better predictive accuracy compared to standard CNN implementations and state-of-art motif finders. We use our network to predict recombination hotspots from sequence, and identify binding motifs for the recombination–initiation protein PRDM9 previously unobserved in this data, which were recently validated by high-resolution assays. The network achieves a predictive accuracy comparable to that attainable by a direct assay of the H3K4me3 histone mark, a proxy for PRDM9 binding.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references6

          • Record: found
          • Abstract: found
          • Article: not found

          JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

          The analysis of regulatory regions in genome sequences is strongly based on the detection of potential transcription factor binding sites. The preferred models for representation of transcription factor binding specificity have been termed position-specific scoring matrices. JASPAR is an open-access database of annotated, high-quality, matrix-based transcription factor binding site profiles for multicellular eukaryotes. The profiles were derived exclusively from sets of nucleotide sequences experimentally demonstrated to bind transcription factors. The database is complemented by a web interface for browsing, searching and subset selection, an online sequence analysis utility and a suite of programming tools for genome-wide and comparative genomic analysis of regulatory regions. JASPAR is available at http://jaspar. cgb.ki.se.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A common sequence motif associated with recombination hot spots and genome instability in humans.

            In humans, most meiotic crossover events are clustered into short regions of the genome known as recombination hot spots. We have previously identified DNA motifs that are enriched in hot spots, particularly the 7-mer CCTCCCT. Here we use the increased hot-spot resolution afforded by the Phase 2 HapMap and novel search methods to identify an extended family of motifs based around the degenerate 13-mer CCNCCNTNNCCNC, which is critical in recruiting crossover events to at least 40% of all human hot spots and which operates on diverse genetic backgrounds in both sexes. Furthermore, these motifs are found in hypervariable minisatellites and are clustered in the breakpoint regions of both disease-causing nonallelic homologous recombination hot spots and common mitochondrial deletion hot spots, implicating the motif as a driver of genome instability.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Dropout:A simple way to prevent neural networks from overfitting

                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 July 2019
                27 November 2018
                27 November 2018
                : 35
                : 13
                : 2177-2184
                Affiliations
                Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Dr, Oxford, UK
                Author notes
                To whom correspondence should be addressed. E-mail: richard.brown@ 123456well.ox.ac.uk
                Author information
                http://orcid.org/0000-0002-3798-2058
                Article
                bty964
                10.1093/bioinformatics/bty964
                6596897
                30481258
                1ae68f5f-cd5d-4e1e-85c2-33c87b33ad2b
                © The Author(s) 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 25 June 2018
                : 13 November 2018
                : 26 November 2018
                Page count
                Pages: 8
                Funding
                Funded by: Wellcome Trust 10.13039/100004440
                Award ID: 090532/Z/09/Z
                Award ID: 203141/Z/16/Z
                Categories
                Original Papers
                Genome Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article