+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Systematic integration of biomedical knowledge prioritizes drugs for repurposing

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The ability to computationally predict whether a compound treats a disease would improve the economy and success rate of drug approval. This study describes Project Rephetio to systematically model drug efficacy based on 755 existing treatments. First, we constructed Hetionet (, an integrative network encoding knowledge from millions of biomedical studies. Hetionet v1.0 consists of 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. Data were integrated from 29 public resources to connect compounds, diseases, genes, anatomies, pathways, biological processes, molecular functions, cellular components, pharmacologic classes, side effects, and symptoms. Next, we identified network patterns that distinguish treatments from non-treatments. Then, we predicted the probability of treatment for 209,168 compound–disease pairs ( Our predictions validated on two external sets of treatment and provided pharmacological insights on epilepsy, suggesting they will help prioritize drug repurposing candidates. This study was entirely open and received realtime feedback from 40 community members.

          eLife digest

          Of all the data in the world today, 90% was created in the last two years. However, taking advantage of this data in order to advance our knowledge is restricted by how quickly we can access it and analyze it in a proper context.

          In biomedical research, data is largely fragmented and stored in databases that typically do not “talk” to each other, thus hampering progress. One particular problem in medicine today is that the process of making a new therapeutic drug from scratch is incredibly expensive and inefficient, making it a risky business. Given the low success rate in drug discovery, there is an economic incentive in trying to repurpose an existing drug that has already been shown to be safe and effective towards a new disease or condition.

          Himmelstein et al. used a computational approach to analyze 50,000 data points – including drugs, diseases, genes and symptoms – from 19 different public databases. This approach made it possible to create more than two million relationships among the data points, which could be used to develop models that predict which drugs currently in use by doctors might be best suited to treat any of 136 common diseases. For example, Himmelstein et al. identified specific drugs currently used to treat depression and alcoholism that could be repurposed to treat smoking addition and epilepsy.

          These findings provide a new and powerful way to study drug repurposing. While this work was exclusively performed with public data, an expanded and potentially stronger set of predictions could be obtained if data owned by pharmaceutical companies were incorporated. Additional studies will be needed to test the predictions made by the models.

          Related collections

          Most cited references 229

          • Record: found
          • Abstract: not found
          • Article: not found

          Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

            • Record: found
            • Abstract: not found
            • Article: not found

            Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

             R. Edgar (2002)
            The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              NCBI GEO: archive for functional genomics data sets—update

              The Gene Expression Omnibus (GEO, is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

                Author and article information

                Role: Reviewing Editor
                eLife Sciences Publications, Ltd
                22 September 2017
                : 6
                [1 ]deptBiological and Medical Informatics Program University of California, San Francisco San FranciscoUnited States
                [2 ]deptDepartment of Systems Pharmacology and Translational Therapeutics University of Pennsylvania PhiladelphiaUnited States
                [3 ]deptDepartment of Neurology University of California, San Francisco San FranciscoUnited States
                [4 ]deptITUN-CRTI-UMR 1064 Inserm University of Nantes NantesFrance
                [5 ]University of Iowa Iowa CityUnited States
                [6 ]Johns Hopkins University BaltimoreUnited States
                [7 ]deptDepartment of Pediatrics University of California, San Fransisco San FransiscoUnited States
                [8 ]deptInstitute for Computational Health Sciences University of California, San Francisco San FranciscoUnited States
                [9 ]deptCenter for Neuroengineering and Therapeutics University of Pennsylvania PhiladelphiaUnited States
                Barcelona Supercomputing Center (BSC) Spain
                Barcelona Supercomputing Center (BSC) Spain
                © 2017, Himmelstein et al

                This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

                Funded by: FundRef, National Science Foundation;
                Award ID: 1144247
                Award Recipient :
                Funded by: Heidrich Family and Friends Foundation;
                Award Recipient :
                Funded by: FundRef, National Institutes of Health;
                Award ID: 5R01NS088155
                Award Recipient :
                Funded by: FundRef, National Cancer Institute;
                Award ID: UH2CA203792
                Award Recipient :
                Funded by: FundRef, U.S. National Library of Medicine;
                Award ID: 1U01LM012675
                Award Recipient :
                The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
                Research Article
                Computational and Systems Biology
                Custom metadata
                Project Rephetio combines data integration and systematic analysis to enable drug repurposing predictions on an unprecedented scale.

                Life sciences

                human, machine learning, heterogeneous networks, drug repurposing


                Comment on this article