Integrating biomedical research and electronic health records to create knowledge-based biologically meaningful machine-readable embeddings

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In order to advance precision medicine, detailed clinical features ought to be described in a way that leverages current knowledge. Although data collected from biomedical research is expanding at an almost exponential rate, our ability to transform that information into patient care has not kept at pace. A major barrier preventing this transformation is that multi-dimensional data collection and analysis is usually carried out without much understanding of the underlying knowledge structure. Here, in an effort to bridge this gap, Electronic Health Records (EHRs) of individual patients are connected to a heterogeneous knowledge network called Scalable Precision Medicine Oriented Knowledge Engine (SPOKE). Then an unsupervised machine-learning algorithm creates Propagated SPOKE Entry Vectors (PSEVs) that encode the importance of each SPOKE node for any code in the EHRs. We argue that these results, alongside the natural integration of PSEVs into any EHR machine-learning platform, provide a key step toward precision medicine.

Abstract

The Scalable Precision Medicine Oriented Knowledge Engine (SPOKE) is a heterogeneous knowledge network that integrates information from 29 public databases. Here, Nelson et al. extend SPOKE to embed clinical data from electronic health records to create medically meaningful barcodes for each medical variable.

Related collections

Most cited references 13

Record: found
Abstract: found
Article: found

Is Open Access

DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants

Janet Piñero, Àlex Bravo, Núria Queralt-Rosinach … (2016)

The information about the genetic basis of human diseases lies at the heart of precision medicine and drug discovery. However, to realize its full potential to support these goals, several problems, such as fragmentation, heterogeneity, availability and different conceptualization of the data must be overcome. To provide the community with a resource free of these hurdles, we have developed DisGeNET (http://www.disgenet.org), one of the largest available collections of genes and variants involved in human diseases. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype–phenotype relationships. The information is accessible through a web interface, a Cytoscape App, an RDF SPARQL endpoint, scripts in several programming languages and an R package. DisGeNET is a versatile platform that can be used for different research purposes including the investigation of the molecular underpinnings of specific human diseases and their comorbidities, the analysis of the properties of disease genes, the generation of hypothesis on drug therapeutic action and drug adverse effects, the validation of computationally predicted disease genes and the evaluation of text-mining methods performance.

0 comments Cited 953 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Is Open Access

DISEASES: text mining and data integration of disease-gene associations.

Sune Pletscher-Frankild, Albert Palleja, Kalliopi Tsafou … (2015)

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download.

0 comments Cited 228 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Systematic integration of biomedical knowledge prioritizes drugs for repurposing

Daniel Scott Himmelstein, Antoine Lizee, Christine Hessler … (2017)

The ability to computationally predict whether a compound treats a disease would improve the economy and success rate of drug approval. This study describes Project Rephetio to systematically model drug efficacy based on 755 existing treatments. First, we constructed Hetionet (neo4j.het.io), an integrative network encoding knowledge from millions of biomedical studies. Hetionet v1.0 consists of 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. Data were integrated from 29 public resources to connect compounds, diseases, genes, anatomies, pathways, biological processes, molecular functions, cellular components, pharmacologic classes, side effects, and symptoms. Next, we identified network patterns that distinguish treatments from non-treatments. Then, we predicted the probability of treatment for 209,168 compound–disease pairs (het.io/repurpose). Our predictions validated on two external sets of treatment and provided pharmacological insights on epilepsy, suggesting they will help prioritize drug repurposing candidates. This study was entirely open and received realtime feedback from 40 community members.

0 comments Cited 158 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Sergio E. Baranzini:

ORCID: http://orcid.org/0000-0003-0067-194X

Sergio.Baranzini@ucsf.edu

Journal

Journal ID (nlm-ta): Nat Commun

Journal ID (iso-abbrev): Nat Commun

Title: Nature Communications

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2041-1723

Publication date (Electronic): 10 July 2019

Publication date PMC-release: 10 July 2019

Publication date Collection: 2019

Volume: 10

Electronic Location Identifier: 3045

Affiliations

[1 ]ISNI 0000 0001 2297 6811, GRID grid.266102.1, Integrated Program in Quantitative Biology, , University of California San Francisco, ; San Francisco, CA USA

[2 ]ISNI 0000 0001 2297 6811, GRID grid.266102.1, Bakar Computational Health Sciences Institute, , University of California San Francisco, ; San Francisco, CA USA

[3 ]ISNI 0000 0001 2297 6811, GRID grid.266102.1, Department of Pediatrics, , University of California San Francisco, ; San Francisco, CA USA

[4 ]ISNI 0000 0001 2297 6811, GRID grid.266102.1, Weill Institute for Neuroscience. Department of Neurology, , University of California San Francisco, ; San Francisco, CA USA

Author information

Charlotte A. Nelson http://orcid.org/0000-0002-3687-1102

Atul J. Butte http://orcid.org/0000-0002-7433-2740

Sergio E. Baranzini http://orcid.org/0000-0003-0067-194X

Article

Publisher ID: 11069

DOI: 10.1038/s41467-019-11069-0

PMC ID: 6620318

PubMed ID: 31292438

SO-VID: 627d7ae9-3859-4e9b-b5f8-006e75aad617

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 15 January 2019

Date accepted : 18 June 2019

Custom metadata

ScienceOpen disciplines: Uncategorized

Keywords: computational platforms and environments,data integration,machine learning,predictive medicine

Data availability:

ScienceOpen disciplines: Uncategorized

Keywords: computational platforms and environments, data integration, machine learning, predictive medicine

Integrating biomedical research and electronic health records to create knowledge-based biologically meaningful machine-readable embeddings

Read this article at

Abstract

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 13

DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants

DISEASES: text mining and data integration of disease-gene associations.

Systematic integration of biomedical knowledge prioritizes drugs for repurposing

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 110

Cited by 32

Most referenced authors 532