The development of efficient and effective bioinformatics tools and pipelines for
identifying peptides with dipeptidyl peptidase IV (DPP-IV) inhibitory activities from
large-scale protein datasets is of great importance for the discovery and development
of potential and promising antidiabetic drugs. In this study, we present a novel stacking-based
ensemble learning predictor (termed StackDPPIV) designed for identification of DPP-IV
inhibitory peptides. Unlike the existing method, which is based on single-feature-based
methods, we combined five popular machine learning algorithms in conjunction with
ten different feature encodings from multiple perspectives to generate a pool of various
baseline models. Subsequently, the probabilistic features derived from these baseline
models were systematically integrated and deemed as new feature representations. Finally,
in order to improve the predictive performance, the genetic algorithm based on the
self-assessment-report was utilized to determine a set of informative probabilistic
features and then used the optimal one for developing the final meta-predictor (StackDPPIV).
Experiment results demonstrated that StackDPPIV could outperform its constituent baseline
models on both the training and independent datasets. Furthermore, StackDPPIV achieved
an accuracy of 0.891, MCC of 0.784 and AUC of 0.961, which were 9.4%, 19.0% and 11.4%,
respectively, higher than that of the existing method on the independent test. Feature
analysis demonstrated that our feature representations had more discriminative ability
as compared to conventional feature descriptors, which highlights the combination
of different features was essential for the performance improvement. In order to implement
the proposed predictor, we had built a user-friendly online web server at http://pmlabstack.pythonanywhere.com/StackDPPIV.