9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predicting enhancers with deep convolutional neural networks

      research-article
      1 , 2 , 1 , 3 , 1 , 3 , 1 , 2 , 1 , 2 , 4 , 1 , 3 ,
      BMC Bioinformatics
      BioMed Central
      IEEE BIBM International Conference on Bioinformatics & Biomedicine (BIBM) 2016
      15-18 December 2016

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          With the rapid development of deep sequencing techniques in the recent years, enhancers have been systematically identified in such projects as FANTOM and ENCODE, forming genome-wide landscapes in a series of human cell lines. Nevertheless, experimental approaches are still costly and time consuming for large scale identification of enhancers across a variety of tissues under different disease status, making computational identification of enhancers indispensable.

          Results

          To facilitate the identification of enhancers, we propose a computational framework, named DeepEnhancer, to distinguish enhancers from background genomic sequences. Our method purely relies on DNA sequences to predict enhancers in an end-to-end manner by using a deep convolutional neural network (CNN). We train our deep learning model on permissive enhancers and then adopt a transfer learning strategy to fine-tune the model on enhancers specific to a cell line. Results demonstrate the effectiveness and efficiency of our method in the classification of enhancers against random sequences, exhibiting advantages of deep learning over traditional sequence-based classifiers. We then construct a variety of neural networks with different architectures and show the usefulness of such techniques as max-pooling and batch normalization in our method. To gain the interpretability of our approach, we further visualize convolutional kernels as sequence logos and successfully identify similar motifs in the JASPAR database.

          Conclusions

          DeepEnhancer enables the identification of novel enhancers using only DNA sequences via a highly accurate deep learning model. The proposed computational framework can also be applied to similar problems, thereby prompting the use of machine learning methods in life sciences.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Enhancers: five essential questions.

          It is estimated that the human genome contains hundreds of thousands of enhancers, so understanding these gene-regulatory elements is a crucial goal. Several fundamental questions need to be addressed about enhancers, such as how do we identify them all, how do they work, and how do they contribute to disease and evolution? Five prominent researchers in this field look at how much we know already and what needs to be done to answer these questions.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The landscape of histone modifications across 1% of the human genome in five human cell lines.

            We generated high-resolution maps of histone H3 lysine 9/14 acetylation (H3ac), histone H4 lysine 5/8/12/16 acetylation (H4ac), and histone H3 at lysine 4 mono-, di-, and trimethylation (H3K4me1, H3K4me2, H3K4me3, respectively) across the ENCODE regions. Studying each modification in five human cell lines including the ENCODE Consortium common cell lines GM06990 (lymphoblastoid) and HeLa-S3, as well as K562, HFL-1, and MOLT4, we identified clear patterns of histone modification profiles with respect to genomic features. H3K4me3, H3K4me2, and H3ac modifications are tightly associated with the transcriptional start sites (TSSs) of genes, while H3K4me1 and H4ac have more widespread distributions. TSSs reveal characteristic patterns of both types of modification present and the position relative to TSSs. These patterns differ between active and inactive genes and in particular the state of H3K4me3 and H3ac modifications is highly predictive of gene activity. Away from TSSs, modification sites are enriched in H3K4me1 and relatively depleted in H3K4me3 and H3ac. Comparison between cell lines identified differences in the histone modification profiles associated with transcriptional differences between the cell lines. These results provide an overview of the functional relationship among histone modifications and gene expression in human cells.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Dropout. A simple way to prevent neural networks from overfitting

                Bookmark

                Author and article information

                Contributors
                minx14@mails.tsinghua.edu.cn
                zengww14@mails.tsinghua.edu.cn
                ccq17@mails.tsinghua.edu.cn
                ningchen@tsinghua.edu.cn
                tingchen@tsinghua.edu.cn
                ruijiang@tsinghua.edu.cn
                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                1 December 2017
                1 December 2017
                2017
                : 18
                Issue : Suppl 13 Issue sponsor : Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare no competing interests.
                : 478
                Affiliations
                [1 ]ISNI 0000 0004 0369 313X, GRID grid.419897.a, MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, ; Beijing, 100084 China
                [2 ]ISNI 0000 0001 0662 3178, GRID grid.12527.33, Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, , Tsinghua University, ; Beijing, 100084 China
                [3 ]ISNI 0000 0001 0662 3178, GRID grid.12527.33, Department of Automation, , Tsinghua University, ; Beijing, 100084 China
                [4 ]ISNI 0000 0001 2156 6853, GRID grid.42505.36, Program in Computational Biology and Bioinformatics, , University of Southern California, ; Los Angeles, CA 90089 USA
                Article
                1878
                10.1186/s12859-017-1878-3
                5773911
                29219068
                bb2a42ac-1a25-4566-aefe-a88fcf825f81
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                IEEE BIBM International Conference on Bioinformatics & Biomedicine (BIBM) 2016
                Shenzhen, China
                15-18 December 2016
                History
                Categories
                Research
                Custom metadata
                © The Author(s) 2017

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article