2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Integrating biomedical research and electronic health records to create knowledge-based biologically meaningful machine-readable embeddings

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In order to advance precision medicine, detailed clinical features ought to be described in a way that leverages current knowledge. Although data collected from biomedical research is expanding at an almost exponential rate, our ability to transform that information into patient care has not kept at pace. A major barrier preventing this transformation is that multi-dimensional data collection and analysis is usually carried out without much understanding of the underlying knowledge structure. Here, in an effort to bridge this gap, Electronic Health Records (EHRs) of individual patients are connected to a heterogeneous knowledge network called Scalable Precision Medicine Oriented Knowledge Engine (SPOKE). Then an unsupervised machine-learning algorithm creates Propagated SPOKE Entry Vectors (PSEVs) that encode the importance of each SPOKE node for any code in the EHRs. We argue that these results, alongside the natural integration of PSEVs into any EHR machine-learning platform, provide a key step toward precision medicine.

          Abstract

          The Scalable Precision Medicine Oriented Knowledge Engine (SPOKE) is a heterogeneous knowledge network that integrates information from 29 public databases. Here, Nelson et al. extend SPOKE to embed clinical data from electronic health records to create medically meaningful barcodes for each medical variable.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants

          The information about the genetic basis of human diseases lies at the heart of precision medicine and drug discovery. However, to realize its full potential to support these goals, several problems, such as fragmentation, heterogeneity, availability and different conceptualization of the data must be overcome. To provide the community with a resource free of these hurdles, we have developed DisGeNET (http://www.disgenet.org), one of the largest available collections of genes and variants involved in human diseases. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype–phenotype relationships. The information is accessible through a web interface, a Cytoscape App, an RDF SPARQL endpoint, scripts in several programming languages and an R package. DisGeNET is a versatile platform that can be used for different research purposes including the investigation of the molecular underpinnings of specific human diseases and their comorbidities, the analysis of the properties of disease genes, the generation of hypothesis on drug therapeutic action and drug adverse effects, the validation of computationally predicted disease genes and the evaluation of text-mining methods performance.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found
            Is Open Access

            DISEASES: text mining and data integration of disease-gene associations.

            Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Systematic integration of biomedical knowledge prioritizes drugs for repurposing

              The ability to computationally predict whether a compound treats a disease would improve the economy and success rate of drug approval. This study describes Project Rephetio to systematically model drug efficacy based on 755 existing treatments. First, we constructed Hetionet (neo4j.het.io), an integrative network encoding knowledge from millions of biomedical studies. Hetionet v1.0 consists of 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. Data were integrated from 29 public resources to connect compounds, diseases, genes, anatomies, pathways, biological processes, molecular functions, cellular components, pharmacologic classes, side effects, and symptoms. Next, we identified network patterns that distinguish treatments from non-treatments. Then, we predicted the probability of treatment for 209,168 compound–disease pairs (het.io/repurpose). Our predictions validated on two external sets of treatment and provided pharmacological insights on epilepsy, suggesting they will help prioritize drug repurposing candidates. This study was entirely open and received realtime feedback from 40 community members.
                Bookmark

                Author and article information

                Contributors
                Sergio.Baranzini@ucsf.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                10 July 2019
                10 July 2019
                2019
                : 10
                : 3045
                Affiliations
                [1 ]ISNI 0000 0001 2297 6811, GRID grid.266102.1, Integrated Program in Quantitative Biology, , University of California San Francisco, ; San Francisco, CA USA
                [2 ]ISNI 0000 0001 2297 6811, GRID grid.266102.1, Bakar Computational Health Sciences Institute, , University of California San Francisco, ; San Francisco, CA USA
                [3 ]ISNI 0000 0001 2297 6811, GRID grid.266102.1, Department of Pediatrics, , University of California San Francisco, ; San Francisco, CA USA
                [4 ]ISNI 0000 0001 2297 6811, GRID grid.266102.1, Weill Institute for Neuroscience. Department of Neurology, , University of California San Francisco, ; San Francisco, CA USA
                Author information
                http://orcid.org/0000-0002-3687-1102
                http://orcid.org/0000-0002-7433-2740
                http://orcid.org/0000-0003-0067-194X
                Article
                11069
                10.1038/s41467-019-11069-0
                6620318
                31292438
                627d7ae9-3859-4e9b-b5f8-006e75aad617
                © The Author(s) 2019

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 15 January 2019
                : 18 June 2019
                Categories
                Article
                Custom metadata
                © The Author(s) 2019

                Uncategorized
                computational platforms and environments,data integration,machine learning,predictive medicine

                Comments

                Comment on this article