8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Objective

          Early identification of chronic diseases is a pillar of precision medicine as it can lead to improved outcomes, reduction of disease burden, and lower healthcare costs. Predictions of a patient’s health trajectory have been improved through the application of machine learning approaches to electronic health records (EHRs). However, these methods have traditionally relied on “black box” algorithms that can process large amounts of data but are unable to incorporate domain knowledge, thus limiting their predictive and explanatory power. Here, we present a method for incorporating domain knowledge into clinical classifications by embedding individual patient data into a biomedical knowledge graph.

          Materials and Methods

          A modified version of the Page rank algorithm was implemented to embed millions of deidentified EHRs into a biomedical knowledge graph (SPOKE). This resulted in high-dimensional, knowledge-guided patient health signatures (ie, SPOKEsigs) that were subsequently used as features in a random forest environment to classify patients at risk of developing a chronic disease.

          Results

          Our model predicted disease status of 5752 subjects 3 years before being diagnosed with multiple sclerosis (MS) (AUC = 0.83). SPOKEsigs outperformed predictions using EHRs alone, and the biological drivers of the classifiers provided insight into the underpinnings of prodromal MS.

          Conclusion

          Using data from EHR as input, SPOKEsigs describe patients at both the clinical and biological levels. We provide a clinical use case for detecting MS up to 5 years prior to their documented diagnosis in the clinic and illustrate the biological features that distinguish the prodromal MS state.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: found
          • Article: not found

          Network medicine: a network-based approach to human disease.

          Given the functional interdependencies between the molecular components in a human cell, a disease is rarely a consequence of an abnormality in a single gene, but reflects the perturbations of the complex intracellular and intercellular network that links tissue and organ systems. The emerging tools of network medicine offer a platform to explore systematically not only the molecular complexity of a particular disease, leading to the identification of disease modules and pathways, but also the molecular relationships among apparently distinct (patho)phenotypes. Advances in this direction are essential for identifying new disease genes, for uncovering the biological significance of disease-associated mutations identified by genome-wide association studies and full-genome sequencing, and for identifying drug targets and biomarkers for complex diseases.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The human disease network.

            A network of disorders and disease genes linked by known disorder-gene associations offers a platform to explore in a single graph-theoretic framework all known phenotype and disease gene associations, indicating the common genetic origin of many diseases. Genes associated with similar disorders show both higher likelihood of physical interactions between their products and higher expression profiling similarity for their transcripts, supporting the existence of distinct disease-specific functional modules. We find that essential human genes are likely to encode hub proteins and are expressed widely in most tissues. This suggests that disease genes also would play a central role in the human interactome. In contrast, we find that the vast majority of disease genes are nonessential and show no tendency to encode hub proteins, and their expression pattern indicates that they are localized in the functional periphery of the network. A selection-based model explains the observed difference between essential and disease genes and also suggests that diseases caused by somatic mutations should not be peripheral, a prediction we confirm for cancer genes.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The prevalence of MS in the United States

              Objective To generate a national multiple sclerosis (MS) prevalence estimate for the United States by applying a validated algorithm to multiple administrative health claims (AHC) datasets. Methods A validated algorithm was applied to private, military, and public AHC datasets to identify adult cases of MS between 2008 and 2010. In each dataset, we determined the 3-year cumulative prevalence overall and stratified by age, sex, and census region. We applied insurance-specific and stratum-specific estimates to the 2010 US Census data and pooled the findings to calculate the 2010 prevalence of MS in the United States cumulated over 3 years. We also estimated the 2010 prevalence cumulated over 10 years using 2 models and extrapolated our estimate to 2017. Results The estimated 2010 prevalence of MS in the US adult population cumulated over 10 years was 309.2 per 100,000 (95% confidence interval [CI] 308.1–310.1), representing 727,344 cases. During the same time period, the MS prevalence was 450.1 per 100,000 (95% CI 448.1–451.6) for women and 159.7 (95% CI 158.7–160.6) for men (female:male ratio 2.8). The estimated 2010 prevalence of MS was highest in the 55- to 64-year age group. A US north-south decreasing prevalence gradient was identified. The estimated MS prevalence is also presented for 2017. Conclusion The estimated US national MS prevalence for 2010 is the highest reported to date and provides evidence that the north-south gradient persists. Our rigorous algorithm-based approach to estimating prevalence is efficient and has the potential to be used for other chronic neurologic conditions.
                Bookmark

                Author and article information

                Contributors
                Journal
                J Am Med Inform Assoc
                J Am Med Inform Assoc
                jamia
                Journal of the American Medical Informatics Association : JAMIA
                Oxford University Press
                1067-5027
                1527-974X
                March 2022
                16 December 2021
                16 December 2021
                : 29
                : 3
                : 424-434
                Affiliations
                Integrated Program in Quantitative Biology, University of California San Francisco , San Francisco, California, USA
                Bakar Computational Health Sciences Institute, University of California San Francisco , San Francisco, California, USA
                Department of Neurology, UCSF Weill Institute for Neurosciences, University of California San Francisco , San Francisco, California, USA
                Bakar Computational Health Sciences Institute, University of California San Francisco , San Francisco, California, USA
                Department of Pediatrics, University of California San Francisco , San Francisco, California, USA
                Integrated Program in Quantitative Biology, University of California San Francisco , San Francisco, California, USA
                Bakar Computational Health Sciences Institute, University of California San Francisco , San Francisco, California, USA
                Department of Neurology, UCSF Weill Institute for Neurosciences, University of California San Francisco , San Francisco, California, USA
                Author notes
                Corresponding Author: Sergio E. Baranzini, PhD, Department of Neurology, UCSF Weill Institute for Neurosciences, University of California San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94143, USA; sergio.baranzini@ 123456ucsf.edu
                Author information
                https://orcid.org/0000-0002-2034-8800
                https://orcid.org/0000-0003-0067-194X
                Article
                ocab270
                10.1093/jamia/ocab270
                8800523
                34915552
                99e4356c-245e-4761-8deb-a04d3791f266
                © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 15 June 2021
                : 22 October 2021
                : 26 November 2021
                : 16 December 2021
                Page count
                Pages: 11
                Funding
                Funded by: US National Science Foundation (Convergence Accelerator;
                Award ID: NSF_1937160
                Funded by: Bakar Family Foundation and the Bakar Computational Health Sciences Institute;
                Categories
                Research and Applications
                AcademicSubjects/MED00580
                AcademicSubjects/SCI01060
                AcademicSubjects/SCI01530

                Bioinformatics & Computational biology
                knowledge graph,electronic health records,multiple sclerosis,preventative medicine

                Comments

                Comment on this article