187
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      scite_
       
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Machine learning methods for predicting DNA- and RNA-binding proteins reveal that protein domains are not the main contributors to identify nucleic acid-binding proteins

      Published
      conference-abstract
      1 , 2 , 1
      ScienceOpen
      Genetoberfest 2023
      16-18 October 2023
      Bookmark

            Abstract

            Nucleic acid-binding proteins play roles in many biological processes and cellular functions, such as replication, transcription, translation, RNA splicing, and methylation. Recently, many machine-learning models have been proposed to identify potential DNA-binding or RNA-binding residues from primary amino acid sequences. However, many nucleic acid-binding proteins are likely to be characterized by non-canonical binding elements. In this study, we questioned whether the affinity profiles of DNA-binding (DBPs) or RNA-binding (RBPs) proteins can be predicted by the multi-omics data including also the primary protein sequences. To this end, we collected experimentally annotated 1,447 DBPs and 351 RBPs. Using selected 555 and 374 multi-omics molecular features including primary amino acid profiles, proteins domains, post-translational modifications, solvent accessibility, secondary structures, tissue specificity index, and protein abundance level, we built two random forest classifiers and two SVM classifiers to predict DBPs and RBPs and achieved the AUC scores of 0.834 and 0.804 for the DBP model and 0.911 and 0.91 for the RBP model respectively, with 10-fold cross-validation. Intriguingly, the top-ranked important features with the best prediction performance are mostly non-domain-specific. This result suggests that protein domains are not the main contributors to identifying DBPs or RBPs, and we propose multi-omics molecular features to be a useful identifier to detect novel DBPs and RBPs with our machine-learning models.

            Author and article information

            Conference
            ScienceOpen
            9 October 2023
            Affiliations
            [1 ] School of Biological Sciences and Technology, Chonnam National University, Gwangju 61186, Republic of Korea;
            [2 ] National Institutes of Health: Bethesda, Maryland, US;
            Author information
            https://orcid.org/0000-0002-0928-9950
            https://orcid.org/0000-0002-9545-6654
            Article
            10.14293/GOF.23.27
            7104b16c-7e70-4f26-b4a8-003c39a122d9

            Published under Creative Commons Attribution 4.0 International ( CC BY 4.0). Users are allowed to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material for any purpose, even commercially), as long as the authors and the publisher are explicitly identified and properly acknowledged as the original source.

            Genetoberfest 2023
            16-18 October 2023
            History
            Product

            ScienceOpen


            Comments

            Comment on this article