507
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Random forest: a classification and regression tool for compound classification and QSAR modeling.

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compound's quantitative or categorical biological activity based on a quantitative description of the compound's molecular structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. We built predictive models for six cheminformatics data sets. Our analysis demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. We also present three additional features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compound similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.

          Related collections

          Author and article information

          Journal
          J Chem Inf Comput Sci
          Journal of chemical information and computer sciences
          American Chemical Society (ACS)
          0095-2338
          0095-2338
          November 25 2003
          : 43
          : 6
          Affiliations
          [1 ] Biometrics Research, Merck Research Laboratories, PO Box 2000, Rahway, New Jersey 07065, USA. vladimir_svetnik@merck.com
          Article
          10.1021/ci034160g
          14632445
          9c77b9cb-341f-4ff7-9e6f-9b66faaf2c80
          History

          Comments

          Comment on this article