7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.

          Related collections

          Most cited references50

          • Record: found
          • Abstract: found
          • Article: not found

          Protein secondary structure prediction based on position-specific scoring matrices.

          D. JONES (1999)
          A two-stage neural network has been used to predict protein secondary structure based on the position specific scoring matrices generated by PSI-BLAST. Despite the simplicity and convenience of the approach used, the results are found to be superior to those produced by other methods, including the popular PHD method according to our own benchmarking results and the results from the recent Critical Assessment of Techniques for Protein Structure Prediction experiment (CASP3), where the method was evaluated by stringent blind testing. Using a new testing set based on a set of 187 unique folds, and three-way cross-validation based on structural similarity criteria rather than sequence similarity criteria used previously (no similar folds were present in both the testing and training sets) the method presented here (PSIPRED) achieved an average Q3 score of between 76.5% to 78.3% depending on the precise definition of observed secondary structure used, which is the highest published score for any method to date. Given the success of the method in CASP3, it is reasonable to be confident that the evaluation presented here gives a fair indication of the performance of the method in general. Copyright 1999 Academic Press.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Evaluation of time profile reconstruction from complex two-color microarray designs

            Background As an alternative to the frequently used "reference design" for two-channel microarrays, other designs have been proposed. These designs have been shown to be more profitable from a theoretical point of view (more replicates of the conditions of interest for the same number of arrays). However, the interpretation of the measurements is less straightforward and a reconstruction method is needed to convert the observed ratios into the genuine profile of interest (e.g. a time profile). The potential advantages of using these alternative designs thus largely depend on the success of the profile reconstruction. Therefore, we compared to what extent different linear models agree with each other in reconstructing expression ratios and corresponding time profiles from a complex design. Results On average the correlation between the estimated ratios was high, and all methods agreed with each other in predicting the same profile, especially for genes of which the expression profile showed a large variance across the different time points. Assessing the similarity in profile shape, it appears that, the more similar the underlying principles of the methods (model and input data), the more similar their results. Methods with a dye effect seemed more robust against array failure. The influence of a different normalization was not drastic and independent of the method used. Conclusion Including a dye effect such as in the methods lmbr_dye, anovaFix and anovaMix compensates for residual dye related inconsistencies in the data and renders the results more robust against array failure. Including random effects requires more parameters to be estimated and is only advised when a design is used with a sufficient number of replicates. Because of this, we believe lmbr_dye, anovaFix and anovaMix are most appropriate for practical use.
              Bookmark
              • Record: found
              • Abstract: not found
              • Book: not found

              The Nature of Statistical Learning Theory

                Bookmark

                Author and article information

                Contributors
                Journal
                Comput Math Methods Med
                Comput Math Methods Med
                cmmm
                Computational and Mathematical Methods in Medicine
                Hindawi
                1748-670X
                1748-6718
                2021
                7 May 2021
                : 2021
                : 5529389
                Affiliations
                1College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
                2Qixin School, Zhejiang Sci-Tech University, Hangzhou 310018, China
                3College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China
                Author notes

                Academic Editor: Lin Lu

                Author information
                https://orcid.org/0000-0003-2675-6511
                Article
                10.1155/2021/5529389
                8123985
                34055035
                144db84d-8fdb-4494-8082-edc785bef673
                Copyright © 2021 Yaoxin Wang et al.

                This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 7 February 2021
                : 28 April 2021
                Funding
                Funded by: Natural Science Foundation of Zhejiang Province
                Award ID: LY20F020016
                Funded by: National Natural Science Foundation of China
                Award ID: 61772028
                Categories
                Research Article

                Applied mathematics
                Applied mathematics

                Comments

                Comment on this article