Systematic integration of biomedical knowledge prioritizes drugs for repurposing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The ability to computationally predict whether a compound treats a disease would improve the economy and success rate of drug approval. This study describes Project Rephetio to systematically model drug efficacy based on 755 existing treatments. First, we constructed Hetionet ( neo4j.het.io), an integrative network encoding knowledge from millions of biomedical studies. Hetionet v1.0 consists of 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. Data were integrated from 29 public resources to connect compounds, diseases, genes, anatomies, pathways, biological processes, molecular functions, cellular components, pharmacologic classes, side effects, and symptoms. Next, we identified network patterns that distinguish treatments from non-treatments. Then, we predicted the probability of treatment for 209,168 compound–disease pairs ( het.io/repurpose). Our predictions validated on two external sets of treatment and provided pharmacological insights on epilepsy, suggesting they will help prioritize drug repurposing candidates. This study was entirely open and received realtime feedback from 40 community members.

eLife digest

Of all the data in the world today, 90% was created in the last two years. However, taking advantage of this data in order to advance our knowledge is restricted by how quickly we can access it and analyze it in a proper context.

In biomedical research, data is largely fragmented and stored in databases that typically do not “talk” to each other, thus hampering progress. One particular problem in medicine today is that the process of making a new therapeutic drug from scratch is incredibly expensive and inefficient, making it a risky business. Given the low success rate in drug discovery, there is an economic incentive in trying to repurpose an existing drug that has already been shown to be safe and effective towards a new disease or condition.

Himmelstein et al. used a computational approach to analyze 50,000 data points – including drugs, diseases, genes and symptoms – from 19 different public databases. This approach made it possible to create more than two million relationships among the data points, which could be used to develop models that predict which drugs currently in use by doctors might be best suited to treat any of 136 common diseases. For example, Himmelstein et al. identified specific drugs currently used to treat depression and alcoholism that could be repurposed to treat smoking addition and epilepsy.

These findings provide a new and powerful way to study drug repurposing. While this work was exclusively performed with public data, an expanded and potentially stronger set of predictions could be obtained if data owned by pharmaceutical companies were incorporated. Additional studies will be needed to test the predictions made by the models.

Related collections

Most cited references 221

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15231 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

NCBI GEO: archive for functional genomics data sets—update

Tanya Barrett, Stephen Wilhite, Pierre Ledoux … (2012)

The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

0 comments Cited 2446 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

R. Edgar (2002)

The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

0 comments Cited 2207 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Daniel Scott Himmelstein:

ORCID: http://orcid.org/0000-0002-3012-7446

Antoine Lizee

Christine Hessler

Leo Brueggeman

Sabrina L Chen

Dexter Hadley

Ari Green

Pouya Khankhanian

Sergio E Baranzini:

ORCID: http://orcid.org/0000-0003-0067-194X

Alfonso Valencia: Role: Reviewing Editor

Journal

Journal ID (nlm-ta): eLife

Journal ID (iso-abbrev): Elife

Journal ID (publisher-id): eLife

Title: eLife

Publisher: eLife Sciences Publications, Ltd

ISSN (Electronic): 2050-084X

Publication date (Electronic, pub): 22 September 2017

Publication date Collection: 2017

Volume: 6

Electronic Location Identifier: e26726

Affiliations

[1 ]deptBiological and Medical Informatics Program University of California, San Francisco San FranciscoUnited States

[2 ]deptDepartment of Systems Pharmacology and Translational Therapeutics University of Pennsylvania PhiladelphiaUnited States

[3 ]deptDepartment of Neurology University of California, San Francisco San FranciscoUnited States

[4 ]deptITUN-CRTI-UMR 1064 Inserm University of Nantes NantesFrance

[5 ]University of Iowa Iowa CityUnited States

[6 ]Johns Hopkins University BaltimoreUnited States

[7 ]deptDepartment of Pediatrics University of California, San Fransisco San FransiscoUnited States

[8 ]deptInstitute for Computational Health Sciences University of California, San Francisco San FranciscoUnited States

[9 ]deptCenter for Neuroengineering and Therapeutics University of Pennsylvania PhiladelphiaUnited States

Barcelona Supercomputing Center (BSC) Spain

Author information

Daniel Scott Himmelstein http://orcid.org/0000-0002-3012-7446

Sergio E Baranzini http://orcid.org/0000-0003-0067-194X

Article

Publisher ID: 26726

DOI: 10.7554/eLife.26726

PMC ID: 5640425

PubMed ID: 28936969

SO-VID: 90c51931-3bcc-4b96-b854-d36fe3c89070

License:

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

History

Date received : 11 March 2017

Date accepted : 11 September 2017

Funding

Funded by: FundRef http://dx.doi.org/10.13039/100000001, National Science Foundation;

Award ID: 1144247

Award Recipient : Daniel Scott Himmelstein

Funded by: Heidrich Family and Friends Foundation;

Award Recipient : Sergio E Baranzini

Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;

Award ID: 5R01NS088155

Award Recipient : Sergio E Baranzini

Funded by: FundRef http://dx.doi.org/10.13039/100000054, National Cancer Institute;

Award ID: UH2CA203792

Award Recipient : Dexter Hadley

Funded by: FundRef http://dx.doi.org/10.13039/100000092, U.S. National Library of Medicine;

Award ID: 1U01LM012675

Award Recipient : Dexter Hadley

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Custom metadata

Author impact statement Project Rephetio combines data integration and systematic analysis to enable drug repurposing predictions on an unprecedented scale.

ScienceOpen disciplines: Life sciences

Keywords: drug repurposing,heterogeneous networks,machine learning,human

Data availability:

ScienceOpen disciplines: Life sciences

Keywords: drug repurposing, heterogeneous networks, machine learning, human

Comments

Comment on this article

scite_

Cited by 146

See all cited by

Most referenced authors 2,876

See all reference authors

- Version 1
- Version 1

Systematic integration of biomedical knowledge prioritizes drugs for repurposing

Read this article at

Abstract

eLife digest

Related collections

Drug Repurposing Research Collection

Most cited references 221

Gene Ontology: tool for the unification of biology

NCBI GEO: archive for functional genomics data sets—update

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 326

Cited by 146

Most referenced authors 2,876