354
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      scite_
       
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Using Deep Learning with Different Architectures to Recognize Triplex DNA Structures

      Preprint
      In review
      research-article
      Bookmark

            Abstract

            Long non-coding RNAs (lncRNAs) can perform their regulatory roles by forming triple helices through RNA-DNA interaction. Although this has been verified by few in vivo and in vitro methods, in silico approaches that seek to predict the potentials of lncRNAs and DNA sites becoming a triplex forming structure is required. Triplexator have also predicted vast amounts of lncRNAs and DNA sites that has the potentials of becoming a triplex structure. There is also an emerging experimental evidence that the presence of epigenetic marks at DNA sites and lncRNAs can facilitate the formation of RNA:DNA triplex structures. There is therefore, a huge demand for computational approaches such as deep learning that can make novel predictions about RNA:DNA triplex structure formation. In this study, we developed fourteen (14) deep neural network models that can predict the potentials of lncRNAs and DNA sites to form triple helices genome-wide, by taking lncRNAs, DNA sites, and histone modification marks as our features. While taking lncRNAs and DNA sites as our features, our data was first passed through the Triplexator to screen out lncRNAs and DNA sites with low potentials of forming triple helices. We used different deep learning architectures to build our models, including two-layer convolutional neural networks (CNN), residual neural networks (ResNN), long short term memory-recurrent neural networks (LSTM-RNN) and multilayer perceptron (MLP). Among these deep neural network architectures, our lncRNA_CNN and LSTM3-RNN both performed best at a mean AUC of 0.99 for the lncRNA features when 32 Kernel size and learning rate of 1e-3 was used. For our DNA site based-features, our DNA_CNN performed best at a mean AUC of 0.98 at 32 Kernel size and learning rate of 1e-3. Lastly, for our histone modification marks based-features, our DNA2_CNN performed best at a mean AUC of 0.78 at 32 Kernel size and learning rate of 1e-3. Our deep neural network models revealed several novel lncRNAs and DNA sites, including HOTAIR, MEG3, PARTICLE, DACOR1, MIR100HG, FENDRR, ANRIL, TUG1, MALAT1, LINC00599, TINCR, NEAT1, roX2, DHFR, OTX2-AS1, Xist, SNHG16, ATXN8OS, BCYRN1, TERC, Khps1, that have the potential of forming triplex structures, thereby confirming previous experimental results and that of the Triplexator. The performance of our models also supports previous findings that histone modification marks can help in identifying lncRNAs and DNA regions that have the potentials of forming RNA:DNA triplex structures. In conclusion, we showed that different deep learning architectures can recognize lncRNAs and DNA that have the potentials of forming RNA:DNA triplex structures.

            Content

            Author and article information

            Journal
            ScienceOpen Preprints
            ScienceOpen
            27 August 2022
            Affiliations
            [1 ] Phystech School of Biological and Medical Physics, Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Moscow Region, Russian Federation
            [2 ] Faculty of Computer Science Big Data and Information Retrieval School Higher School of Economics, Moscow
            Author notes
            Author information
            https://orcid.org/0000-0002-1526-1305
            https://orcid.org/0000-0002-3058-0005
            Article
            10.14293/S2199-1006.1.SOR-.PPO9GVR.v1
            3a0afa44-364e-4130-8e5e-22140ae2e52f

            This work has been published open access under Creative Commons Attribution License CC BY 4.0 , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com .

            History
            : 27 August 2022

            The datasets generated during and/or analysed during the current study are available in the repository: https://github.com/Joseph-Luper-Tsenum
            Medicine,Computer science,Statistics,Mathematics,Life sciences
            Long non-coding RNAs, DNA sites, histone modification marks, triplex structures, deep learning architectures

            Comments

            Comment on this article