3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Related collections

          Most cited references75

          • Record: found
          • Abstract: not found
          • Article: not found

          Regression Shrinkage and Selection Via the Lasso

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            PLS-regression: a basic tool of chemometrics

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Permutation importance: a corrected feature importance measure.

              In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support vector machines and RandomForest (RF) models. Recently, it has been observed that RF models are biased in such a way that categorical variables with a large number of categories are preferred. In this work, we introduce a heuristic for normalizing feature importance measures that can correct the feature importance bias. The method is based on repeated permutations of the outcome vector for estimating the distribution of measured importance for each variable in a non-informative setting. The P-value of the observed importance provides a corrected measure of feature importance. We apply our method to simulated data and demonstrate that (i) non-informative predictors do not receive significant P-values, (ii) informative variables can successfully be recovered among non-informative variables and (iii) P-values computed with permutation importance (PIMP) are very helpful for deciding the significance of variables, and therefore improve model interpretability. Furthermore, PIMP was used to correct RF-based importance measures for two real-world case studies. We propose an improved RF model that uses the significant variables with respect to the PIMP measure and show that its prediction accuracy is superior to that of other existing models. R code for the method presented in this article is available at http://www.mpi-inf.mpg.de/ approximately altmann/download/PIMP.R CONTACT: altmann@mpi-inf.mpg.de, laura.tolosi@mpi-inf.mpg.de Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                Journal
                Journal of Chemometrics
                Journal of Chemometrics
                Wiley
                0886-9383
                1099-128X
                May 07 2021
                Affiliations
                [1 ]Knowledge Discovery Know‐Center Graz Austria
                [2 ]NMR Centre Ruđer Bošković Institute Zagreb Croatia
                [3 ]Department of Chemistry National University of Singapore Singapore
                [4 ]Institute of Interactive Systems and Data Graz University of Technology Graz Austria
                Article
                10.1002/cem.3349
                52475d3a-2dce-40f4-9ce0-0393326dfaa7
                © 2021

                http://creativecommons.org/licenses/by/4.0/

                http://doi.wiley.com/10.1002/tdm_license_1.1

                History

                Comments

                Comment on this article