15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. Existing computational methods utilize multiple lncRNA features or multiple protein features to predict lncRNA-protein interactions, but features are not available for all lncRNAs or proteins; most of existing methods are not capable of predicting interacting proteins (or lncRNAs) for new lncRNAs (or proteins), which don’t have known interactions. In this paper, we propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict lncRNA-protein interactions. First, SFPEL-LPI extracts lncRNA sequence-based features and protein sequence-based features. Second, SFPEL-LPI calculates multiple lncRNA-lncRNA similarities and protein-protein similarities by using lncRNA sequences, protein sequences and known lncRNA-protein interactions. Then, SFPEL-LPI combines multiple similarities and multiple features with a feature projection ensemble learning frame. In computational experiments, SFPEL-LPI accurately predicts lncRNA-protein associations and outperforms other state-of-the-art methods. More importantly, SFPEL-LPI can be applied to new lncRNAs (or proteins). The case studies demonstrate that our method can find out novel lncRNA-protein interactions, which are confirmed by literature. Finally, we construct a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/.

          Author summary

          LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. In this paper, we propose a novel computational method “SFPEL-LPI” to predict lncRNA-protein interactions. SFPEL-LPI makes use of lncRNA sequences, protein sequences and known lncRNA-protein associations to extract features and calculate similarities for lncRNAs and proteins, and then combines them with a feature projection ensemble learning frame. SFPEL-LPI can predict unobserved interactions between lncRNAs and proteins, and also can make predictions for new lncRNAs (or proteins), which have no interactions with any proteins (or lncRNAs). SFPEL-LPI produces high-accuracy performances on the benchmark dataset when evaluated by five-fold cross validation, and outperforms state-of-the-art methods. The case studies demonstrate that SFPEL-LPI can find out novel associations, which are confirmed by literature. To facilitate the lncRNA-protein interaction prediction, we develop a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/.

          Related collections

          Most cited references58

          • Record: found
          • Abstract: found
          • Article: not found

          Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function.

          The mammalian transcriptome contains many non-protein-coding RNAs (ncRNAs), but most of these are of unclear significance and lack strong sequence conservation, prompting suggestions that they might be non-functional. However, certain long functional ncRNAs such as Air and Xist are also poorly conserved. In this article, we systematically analyzed the conservation of several groups of functional ncRNAs, including miRNAs, snoRNAs and longer ncRNAs whose function has been either documented or confidently predicted. As expected, miRNAs and snoRNAs were highly conserved. By contrast, the longer functional non-micro, non-sno ncRNAs were much less conserved with many displaying rapid sequence evolution. Our findings suggest that longer ncRNAs are under the influence of different evolutionary constraints and that the lack of conservation displayed by the thousands of candidate ncRNAs does not necessarily signify an absence of function.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Graph Regularized Nonnegative Matrix Factorization for Data Representation.

            Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. On the other hand, from the geometric perspective, the data is usually sampled from a low-dimensional manifold embedded in a high-dimensional ambient space. One then hopes to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Graph Regularized Nonnegative Matrix Factorization (GNMF), for this purpose. In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure. Our empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Predicting RNA-Protein Interactions Using Only Sequence Information

              Background RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. Results We propose RPISeq , a family of classifiers for predicting R NA- p rotein i nteractions using only seq uence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. Conclusions Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.
                Bookmark

                Author and article information

                Contributors
                Role: InvestigationRole: MethodologyRole: Writing – original draft
                Role: InvestigationRole: VisualizationRole: Writing – original draft
                Role: Formal analysisRole: ResourcesRole: Validation
                Role: Software
                Role: Formal analysisRole: Visualization
                Role: Conceptualization
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                11 December 2018
                December 2018
                : 14
                : 12
                : e1006616
                Affiliations
                [1 ] College of Informatics, Huazhong Agricultural University, Wuhan, China
                [2 ] School of Computer Science, Wuhan University, Wuhan, China
                [3 ] Department of Computer Science and Engineering, The Ohio State University, Columbus, United States of America
                [4 ] Electronic Information School, Wuhan University, Wuhan, China
                Ottawa University, CANADA
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0002-9426-6453
                Article
                PCOMPBIOL-D-18-01045
                10.1371/journal.pcbi.1006616
                6331124
                30533006
                10ad656e-c6fe-4061-9c44-9c9a4a7e9afd
                © 2018 Zhang et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 19 June 2018
                : 2 November 2018
                Page count
                Figures: 8, Tables: 3, Pages: 21
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: 61772381
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: 61572368
                Award Recipient :
                Funded by: Fundamental Research Funds for the Central Universities
                Award ID: 2042017kf0219
                Award Recipient :
                WZ was supported by the National Natural Science Foundation of China (61772381, 61572368), the Fundamental Research Funds for the Central Universities (2042017kf0219, 2042018kf0249), Huazhong Agricultural University Scientific & Technological Self-innovation Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and life sciences
                Biochemistry
                Nucleic acids
                RNA
                Non-coding RNA
                Long non-coding RNAs
                Biology and Life Sciences
                Biochemistry
                Proteins
                Protein Interactions
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Forecasting
                Physical Sciences
                Mathematics
                Statistics
                Statistical Methods
                Forecasting
                Computer and Information Sciences
                Network Analysis
                Protein Interaction Networks
                Biology and Life Sciences
                Biochemistry
                Proteomics
                Protein Interaction Networks
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Gene Ontologies
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Gene Ontologies
                Biology and Life Sciences
                Biochemistry
                Proteins
                Protein Interactions
                Protein-Protein Interactions
                Research and Analysis Methods
                Database and Informatics Methods
                Biological Databases
                Genomic Databases
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genomic Databases
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genomic Databases
                Research and Analysis Methods
                Extraction Techniques
                Protein Extraction
                Custom metadata
                vor-update-to-uncorrected-proof
                2019-01-14
                All relevant data are within the paper and its Supporting Information files.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article