23
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A random forest based computational model for predicting novel lncRNA-disease associations

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Accumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Accurately identifying disease-associated lncRNAs is helpful to study the mechanism of lncRNAs in diseases and explore new therapies of diseases. Many lncRNA-disease association (LDA) prediction models have been implemented by integrating multiple kinds of data resources. However, most of the existing models ignore the interference of noisy and redundancy information among these data resources.

          Results

          To improve the ability of LDA prediction models, we implemented a random forest and feature selection based LDA prediction model (RFLDA in short). First, the RFLDA integrates the experiment-supported miRNA-disease associations (MDAs) and LDAs, the disease semantic similarity (DSS), the lncRNA functional similarity (LFS) and the lncRNA-miRNA interactions (LMI) as input features. Then, the RFLDA chooses the most useful features to train prediction model by feature selection based on the random forest variable importance score that takes into account not only the effect of individual feature on prediction results but also the joint effects of multiple features on prediction results. Finally, a random forest regression model is trained to score potential lncRNA-disease associations. In terms of the area under the receiver operating characteristic curve (AUC) of 0.976 and the area under the precision-recall curve (AUPR) of 0.779 under 5-fold cross-validation, the performance of the RFLDA is better than several state-of-the-art LDA prediction models. Moreover, case studies on three cancers demonstrate that 43 of the 45 lncRNAs predicted by the RFLDA are validated by experimental data, and the other two predicted lncRNAs are supported by other LDA prediction models.

          Conclusions

          Cross-validation and case studies indicate that the RFLDA has excellent ability to identify potential disease-associated lncRNAs.

          Related collections

          Most cited references52

          • Record: found
          • Abstract: found
          • Article: not found

          A new method to measure the semantic similarity of GO terms.

          Although controlled biochemical or biological vocabularies, such as Gene Ontology (GO) (http://www.geneontology.org), address the need for consistent descriptions of genes in different data sources, there is still no effective method to determine the functional similarities of genes based on gene annotation information from heterogeneous data sources. To address this critical need, we proposed a novel method to encode a GO term's semantics (biological meanings) into a numeric value by aggregating the semantic contributions of their ancestor terms (including this specific term) in the GO graph and, in turn, designed an algorithm to measure the semantic similarity of GO terms. Based on the semantic similarities of GO terms used for gene annotation, we designed a new algorithm to measure the functional similarity of genes. The results of using our algorithm to measure the functional similarities of genes in pathways retrieved from the saccharomyces genome database (SGD), and the outcomes of clustering these genes based on the similarity values obtained by our algorithm are shown to be consistent with human perspectives. Furthermore, we developed a set of online tools for gene similarity measurement and knowledge discovery. The online tools are available at: http://bioinformatics.clemson.edu/G-SESAME. http://bioinformatics.clemson.edu/Publication/Supplement/gsp.htm.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Long non-coding RNAs and complex diseases: from experimental results to computational models

            Abstract LncRNAs have attracted lots of attentions from researchers worldwide in recent decades. With the rapid advances in both experimental technology and computational prediction algorithm, thousands of lncRNA have been identified in eukaryotic organisms ranging from nematodes to humans in the past few years. More and more research evidences have indicated that lncRNAs are involved in almost the whole life cycle of cells through different mechanisms and play important roles in many critical biological processes. Therefore, it is not surprising that the mutations and dysregulations of lncRNAs would contribute to the development of various human complex diseases. In this review, we first made a brief introduction about the functions of lncRNAs, five important lncRNA-related diseases, five critical disease-related lncRNAs and some important publicly available lncRNA-related databases about sequence, expression, function, etc. Nowadays, only a limited number of lncRNAs have been experimentally reported to be related to human diseases. Therefore, analyzing available lncRNA–disease associations and predicting potential human lncRNA–disease associations have become important tasks of bioinformatics, which would benefit human complex diseases mechanism understanding at lncRNA level, disease biomarker detection and disease diagnosis, treatment, prognosis and prevention. Furthermore, we introduced some state-of-the-art computational models, which could be effectively used to identify disease-related lncRNAs on a large scale and select the most promising disease-related lncRNAs for experimental validation. We also analyzed the limitations of these models and discussed the future directions of developing computational models for lncRNA research.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers

              Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource.
                Bookmark

                Author and article information

                Contributors
                ydkvictory@hrbust.edu.cn
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                27 March 2020
                27 March 2020
                2020
                : 21
                : 126
                Affiliations
                [1 ]ISNI 0000 0000 8621 1394, GRID grid.411994.0, School of Software and Microelectronics, , Harbin University of Science and Technology, ; Harbin, 150080 China
                [2 ]ISNI 0000 0004 1763 3496, GRID grid.484612.d, College of Computer Science and Technology, , Heilongjiang Institute of Technology, ; Harbin, 150050 China
                [3 ]ISNI 0000 0004 1797 9737, GRID grid.412596.d, Department of Endocrinology and Metabolism, , the First Affiliated Hospital of Harbin Medical University, ; Harbin, Heilongjiang China
                [4 ]ISNI 0000 0001 2224 0361, GRID grid.59025.3b, School of Computer Science and Engineering, , Nanyang Technological University, ; Singapore, 639798 Singapore
                [5 ]ISNI 0000 0000 8621 1394, GRID grid.411994.0, Department of Software Engineering, , Harbin University of Science and Technology, ; Rongcheng, 264300 China
                Author information
                http://orcid.org/0000-0002-4974-3054
                Article
                3458
                10.1186/s12859-020-3458-1
                7099795
                32216744
                e1d1740c-709e-449a-aaef-084559aff76a
                © The Author(s). 2020

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 19 December 2019
                : 18 March 2020
                Funding
                Funded by: Innovation Talents Project of Harbin Science and Technology Bureau
                Award ID: 2017RAQXJ027
                Award Recipient :
                Funded by: Fundamental Research Foundation for Universities of Heilongjiang Province
                Award ID: LGYC2018JQ003
                Award Recipient :
                Funded by: Natural Science Foundation of Heilongjiang Province
                Award ID: LH2019F023
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100004543, China Scholarship Council;
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2020

                Bioinformatics & Computational biology
                random forest,variable importance,feature selection,lncrna-disease association prediction,bioinformatics algorithm

                Comments

                Comment on this article