24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html

          Related collections

          Most cited references50

          • Record: found
          • Abstract: found
          • Article: not found

          Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms.

          Information on subcellular localization of proteins is important to molecular cell biology, proteomics, system biology and drug discovery. To provide the vast majority of experimental scientists with a user-friendly tool in these areas, we present a package of Web servers developed recently by hybridizing the 'higher level' approach with the ab initio approach. The package is called Cell-PLoc and contains the following six predictors: Euk-mPLoc, Hum-mPLoc, Plant-PLoc, Gpos-PLoc, Gneg-PLoc and Virus-PLoc, specialized for eukaryotic, human, plant, Gram-positive bacterial, Gram-negative bacterial and viral proteins, respectively. Using these Web servers, one can easily get the desired prediction results with a high expected accuracy, as demonstrated by a series of cross-validation tests on the benchmark data sets that covered up to 22 subcellular location sites and in which none of the proteins included had > or =25% sequence identity to any other protein in the same subcellular-location subset. Some of these Web servers can be particularly used to deal with multiplex proteins as well, which may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic features of this kind are particularly interesting, because they may have some special biological functions intriguing to investigators in both basic research and drug discovery. This protocol is a step-by-step guide on how to use the Web-server predictors in the Cell-PLoc package. The computational time for each prediction is less than 5 s in most cases. The Cell-PLoc package is freely accessible at http://chou.med.harvard.edu/bioinf/Cell-PLoc.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins.

            Recent studies have found many proteins containing regions that do not form well-defined three-dimensional structures in their native states. The study and detection of such disordered regions is important both for understanding protein function and for facilitating structural analysis since disordered regions may affect solubility and/or crystallizability. We have developed the regional order neural network (RONN) software as an application of our recently developed 'bio-basis function neural network' pattern recognition algorithm to the detection of natively disordered regions in proteins. The results of blind-testing a panel of nine disorder prediction tools (including RONN) against 80 protein sequences derived from the Protein Data Bank shows that, based on the probability excess measure, RONN performed the best.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Controlling the double helix.

              Chromatin is the complex of DNA and proteins in which the genetic material is packaged inside the cells of organisms with nuclei. Chromatin structure is dynamic and exerts profound control over gene expression and other fundamental cellular processes. Changes in its structure can be inherited by the next generation, independent of the DNA sequence itself.
                Bookmark

                Author and article information

                Journal
                Oncotarget
                Oncotarget
                Oncotarget
                ImpactJ
                Oncotarget
                Impact Journals LLC
                1949-2553
                5 January 2018
                8 December 2017
                : 9
                : 2
                : 1944-1956
                Affiliations
                1 Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
                2 Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
                Author notes
                Correspondence to: Balachandran Manavalan, bala@ 123456ajou.ac.kr
                Article
                23099
                10.18632/oncotarget.23099
                5788611
                29416743
                e01e21f8-6f5c-44e2-b8df-c92a7ea5990e
                Copyright: © 2018 Manavalan et al.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License 3.0 (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 6 September 2017
                : 17 November 2017
                Categories
                Research Paper

                Oncology & Radiotherapy
                dnase i hypersensitive site,feature selection,machine learning,random forest,support vector machine

                Comments

                Comment on this article