Daniel Scott Himmelstein 1 , 2 , Antoine Lizee 3 , 4 , Christine Hessler 3 , Leo Brueggeman 3 , 5 , Sabrina L Chen 3 , 6 , Dexter Hadley 7 , 8 , Ari Green 3 , Pouya Khankhanian 3 , 9 , Sergio E Baranzini , 1 , 3
22 September 2017
The ability to computationally predict whether a compound treats a disease would improve the economy and success rate of drug approval. This study describes Project Rephetio to systematically model drug efficacy based on 755 existing treatments. First, we constructed Hetionet ( neo4j.het.io), an integrative network encoding knowledge from millions of biomedical studies. Hetionet v1.0 consists of 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. Data were integrated from 29 public resources to connect compounds, diseases, genes, anatomies, pathways, biological processes, molecular functions, cellular components, pharmacologic classes, side effects, and symptoms. Next, we identified network patterns that distinguish treatments from non-treatments. Then, we predicted the probability of treatment for 209,168 compound–disease pairs ( het.io/repurpose). Our predictions validated on two external sets of treatment and provided pharmacological insights on epilepsy, suggesting they will help prioritize drug repurposing candidates. This study was entirely open and received realtime feedback from 40 community members.
Of all the data in the world today, 90% was created in the last two years. However, taking advantage of this data in order to advance our knowledge is restricted by how quickly we can access it and analyze it in a proper context.
In biomedical research, data is largely fragmented and stored in databases that typically do not “talk” to each other, thus hampering progress. One particular problem in medicine today is that the process of making a new therapeutic drug from scratch is incredibly expensive and inefficient, making it a risky business. Given the low success rate in drug discovery, there is an economic incentive in trying to repurpose an existing drug that has already been shown to be safe and effective towards a new disease or condition.
Himmelstein et al. used a computational approach to analyze 50,000 data points – including drugs, diseases, genes and symptoms – from 19 different public databases. This approach made it possible to create more than two million relationships among the data points, which could be used to develop models that predict which drugs currently in use by doctors might be best suited to treat any of 136 common diseases. For example, Himmelstein et al. identified specific drugs currently used to treat depression and alcoholism that could be repurposed to treat smoking addition and epilepsy.
These findings provide a new and powerful way to study drug repurposing. While this work was exclusively performed with public data, an expanded and potentially stronger set of predictions could be obtained if data owned by pharmaceutical companies were incorporated. Additional studies will be needed to test the predictions made by the models.