Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.
In drug discovery, the goal is to identify new compounds to alter the behavior of a protein implicated in disease. With the very large number of small molecules to test, researchers have increasingly studied fragments (compounds with a small number of atoms) because there are fewer possibilities to evaluate and they can be used to identify larger compounds. Computational tools can efficiently assess if a fragment will bind a protein target of interest. Given the large number of structures available for protein-small molecule complexes, we present in this study a data-driven computational method for fragment binding prediction called FragFEATURE. FragFEATURE predicts fragments preferred by a protein structure using a knowledge base of all previously observed protein-fragment interactions. Comparison to previous observations enables it to determine if a query structure is likely to bind particular fragments. For numerous protein structures bound to small molecules, FragFEATURE predicted fragments matching the bound entity. For multiple proteins, it also predicted fragments matching drugs known to inhibit the proteins. These fragments can therefore lead us to promising drug-like compounds to study further using computational tools or experimental resources.