19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      A pipeline for improved QSAR analysis of peptides: physiochemical property parameter selection via BMSF, near-neighbor sample selection via semivariogram, and weighted SVR regression and prediction.

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In this paper, we present a pipeline to perform improved QSAR analysis of peptides. The modeling involves a double selection procedure that first performs feature selection and then conducts sample selection before the final regression analysis. Five hundred and thirty-one physicochemical property parameters of amino acids were used as descriptors to characterize the structure of peptides. These high-dimensional descriptors then go through a feature selection process given by the binary matrix shuffling filter (BMSF) to obtain a set of important low-dimensional features. Each descriptor that passes the BMSF filtering also receives a weight defined through its contribution to reduce the estimation error. These selected features served as the predictors for subsequent sample selection and modeling. Based on the weighted Euclidean distances between samples, a common range was determined with high-dimensional semivariogram and then used as a threshold to select the near-neighbor samples from the training set. For each sample to be predicted, the QSAR model was established using SVR with the weighted, selected features based on the exclusive set of near-neighbor training samples. Prediction was conducted for each test sample accordingly. The performances of this pipeline are tested with the QSAR analysis of angiotensin-converting enzyme inhibitors and HLA-A*0201 data sets. Improved prediction accuracy was obtained in both applications. This pipeline can optimize the QSAR modeling from both the feature selection and sample selection perspectives. This leads to improved accuracy over single selection methods. We expect this pipeline to have extensive application prospect in the field of regression prediction.

          Related collections

          Author and article information

          Journal
          Amino Acids
          Amino acids
          Springer Science and Business Media LLC
          1438-2199
          0939-4451
          Apr 2014
          : 46
          : 4
          Affiliations
          [1 ] Hunan Provincial Key Laboratory for Germplasm Innovation and Utilization of Crop, Hunan Agricultural University, Changsha, China, daizhijun@foxmail.com.
          Article
          10.1007/s00726-014-1667-5
          24468929
          fad57352-6ba8-4489-b2db-cf7dd7b1af06
          History

          Comments

          Comment on this article