0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a “word” in linguistics, the word segmentation methods are proposed to divide DNA sequences into “words”, and the skip-gram model is employed to transform the “words” into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract “words” from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.

          Related collections

          Most cited references42

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          CD-HIT Suite: a web server for clustering and comparing biological sequences

          Summary: CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. In order to further assist the CD-HIT users, we significantly improved this program with more functions and better accuracy, scalability and flexibility. Most importantly, we developed a new web server, CD-HIT Suite, for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels. Users can now interactively explore the clusters within web browsers. We also provide downloadable clusters for several public databases (NCBI NR, Swissprot and PDB) at different identity levels. Availability: Free access at http://cd-hit.org Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition.

            Enhancers are of short regulatory DNA elements. They can be bound with proteins (activators) to activate transcription of a gene, and hence play a critical role in promoting gene transcription in eukaryotes. With the avalanche of DNA sequences generated in the post-genomic age, it is a challenging task to develop computational methods for timely identifying enhancers from extremely complicated DNA sequences. Although some efforts have been made in this regard, they were limited at only identifying whether a query DNA element being of an enhancer or not. According to the distinct levels of biological activities and regulatory effects on target genes, however, enhancers should be further classified into strong and weak ones in strength.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              What is the expectation maximization algorithm?

                Bookmark

                Author and article information

                Contributors
                Role: Academic Editor
                Journal
                Int J Mol Sci
                Int J Mol Sci
                ijms
                International Journal of Molecular Sciences
                MDPI
                1422-0067
                30 March 2021
                April 2021
                : 22
                : 7
                : 3589
                Affiliations
                School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai 264209, China; yrt@ 123456sdu.edu.cn (R.Y.); 201816507@ 123456mail.sdu.edu.cn (F.W.); cjzhang@ 123456sdu.edu.cn (C.Z.)
                Author notes
                [* ]Correspondence: zln@ 123456sdu.edu.cn
                Author information
                https://orcid.org/0000-0002-0066-2114
                Article
                ijms-22-03589
                10.3390/ijms22073589
                8036415
                33808317
                b1470ade-1b3f-4743-afc1-135da03de183
                © 2021 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( https://creativecommons.org/licenses/by/4.0/).

                History
                : 03 February 2021
                : 24 March 2021
                Categories
                Article

                Molecular biology
                enhancer,word embedding,sequence generative adversarial net,convolutional neural network

                Comments

                Comment on this article