55
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine‐learning scoring functions to improve structure‐based binding affinity prediction and virtual screening

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure‐based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine‐learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine‐learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert‐selected structural features can be strongly improved by a machine‐learning approach based on nonlinear regression allied with comprehensive data‐driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405–424. doi: 10.1002/wcms.1225

          For further resources related to this article, please visit the WIREs website.

          Related collections

          Most cited references69

          • Record: found
          • Abstract: found
          • Article: not found

          Benchmarking sets for molecular docking.

          Ligand enrichment among top-ranking hits is a key metric of molecular docking. To avoid bias, decoys should resemble ligands physically, so that enrichment is not simply a separation of gross features, yet be chemically distinct from them, so that they are unlikely to be binders. We have assembled a directory of useful decoys (DUD), with 2950 ligands for 40 different targets. Every ligand has 36 decoy molecules that are physically similar but topologically distinct, leading to a database of 98,266 compounds. For most targets, enrichment was at least half a log better with uncorrected databases such as the MDDR than with DUD, evidence of bias in the former. These calculations also allowed 40x40 cross-docking, where the enrichments of each ligand set could be compared for all 40 targets, enabling a specificity metric for the docking screens. DUD is freely available online as a benchmarking set for docking at http://blaster.docking.org/dud/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A critical assessment of docking programs and scoring functions.

            Docking is a computational technique that samples conformations of small molecules in protein binding sites; scoring functions are used to assess which of these conformations best complements the protein binding site. An evaluation of 10 docking programs and 37 scoring functions was conducted against eight proteins of seven protein types for three tasks: binding mode prediction, virtual screening for lead identification, and rank-ordering by affinity for lead optimization. All of the docking programs were able to generate ligand conformations similar to crystallographically determined protein/ligand complex structures for at least one of the targets. However, scoring functions were less successful at distinguishing the crystallographic conformation from the set of docked poses. Docking programs identified active compounds from a pharmaceutically relevant pool of decoy compounds; however, no single program performed well for all of the targets. For prediction of compound affinity, none of the docking programs or scoring functions made a useful prediction of ligand binding affinity.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking.

              Accurately predicting the binding affinities of large sets of diverse protein-ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions. We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score. pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Journal
                Wiley Interdiscip Rev Comput Mol Sci
                Wiley Interdiscip Rev Comput Mol Sci
                10.1111/(ISSN)1759-0884
                WCMS
                Wiley Interdisciplinary Reviews. Computational Molecular Science
                John Wiley & Sons, Inc. (Hoboken, USA )
                1759-0876
                1759-0884
                28 August 2015
                Nov-Dec 2015
                : 5
                : 6 ( doiID: 10.1002/wcms.2015.5.issue-6 )
                : 405-424
                Affiliations
                [ 1 ] Department of Chemistry, Centre for Molecular InformaticsUniversity of Cambridge CambridgeUK
                [ 2 ] Cavendish LaboratoryUniversity of Cambridge CambridgeUK
                [ 3 ]Cancer Research Center of Marseille, (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université, CNRS UMR7258) MarseilleFrance
                Author notes
                [*] [* ]Correspondence to: pedro.ballester@ 123456inserm.fr
                Article
                WCMS1225
                10.1002/wcms.1225
                4832270
                27110292
                ff9d7d28-631a-4e3b-91f4-4e9ce8eb98e4
                © 2015 The Authors. WIREs Computational Molecular Science published by John Wiley & Sons, Ltd.

                This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

                History
                : 03 June 2015
                : 17 July 2015
                : 18 July 2015
                Page count
                Pages: 20
                Funding
                Funded by: French Government «Investissements d'Avenir» program
                Award ID: n° ANR‐11‐IDEX‐0001‐02
                Funded by: UK Medical Research Council Methodology Research Fellowship
                Award ID: G0902106
                Categories
                Chemoinformatics
                Advanced Review
                Advanced Reviews
                Custom metadata
                2.0
                wcms1225
                wcms1225-hdr-0001
                November/December 2015
                Converter:WILEY_ML3GV2_TO_NLMPMC version:4.8.6 mode:remove_FC converted:15.04.2016

                Comments

                Comment on this article