5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Inferring multimodal latent topics from electronic health records

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Electronic health records (EHR) are rich heterogeneous collections of patient health information, whose broad adoption provides clinicians and researchers unprecedented opportunities for health informatics, disease-risk prediction, actionable clinical recommendations, and precision medicine. However, EHRs present several modeling challenges, including highly sparse data matrices, noisy irregular clinical notes, arbitrary biases in billing code assignment, diagnosis-driven lab tests, and heterogeneous data types. To address these challenges, we present MixEHR, a multi-view Bayesian topic model. We demonstrate MixEHR on MIMIC-III, Mayo Clinic Bipolar Disorder, and Quebec Congenital Heart Disease EHR datasets. Qualitatively, MixEHR disease topics reveal meaningful combinations of clinical features across heterogeneous data types. Quantitatively, we observe superior prediction accuracy of diagnostic codes and lab test imputations compared to the state-of-art methods. We leverage the inferred patient topic mixtures to classify target diseases and predict mortality of patients in critical conditions. In all comparison, MixEHR confers competitive performance and reveals meaningful disease-related topics.

          Abstract

          Electronic Health Records (EHR) are subject to noise, biases and missing data. Here, the authors present MixEHR, a multi-view Bayesian framework related to collaborative filtering and latent topic models for EHR data integration and modeling.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          Finding scientific topics.

          A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying "hot topics" by examining temporal dynamics and tagging abstracts to illustrate semantic content.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records

            Secondary use of electronic health records (EHRs) promises to advance clinical research and better inform clinical decision making. Challenges in summarizing and representing patient data prevent widespread practice of predictive modeling using EHRs. Here we present a novel unsupervised deep feature learning method to derive a general-purpose patient representation from EHR data that facilitates clinical predictive modeling. In particular, a three-layer stack of denoising autoencoders was used to capture hierarchical regularities and dependencies in the aggregated EHRs of about 700,000 patients from the Mount Sinai data warehouse. The result is a representation we name “deep patient”. We evaluated this representation as broadly predictive of health states by assessing the probability of patients to develop various diseases. We performed evaluation using 76,214 test patients comprising 78 diseases from diverse clinical domains and temporal windows. Our results significantly outperformed those achieved using representations based on raw EHR data and alternative feature learning strategies. Prediction performance for severe diabetes, schizophrenia, and various cancers were among the top performing. These findings indicate that deep learning applied to EHRs can derive patient representations that offer improved clinical predictions, and could provide a machine learning framework for augmenting clinical decision systems.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Congenital heart disease in the general population: changing prevalence and age distribution.

              Empirical data on the changing epidemiology of congenital heart disease (CHD) are scant. We determined the prevalence, age distribution, and proportion of adults and children with severe and other forms of CHD in the general population from 1985 to 2000. Where healthcare access is universal, we used administrative databases that systematically recorded all diagnoses and claims. Diagnostic codes conformed to the International Classification of Disease, ninth revision. Severe CHD was defined as tetralogy of Fallot, truncus arteriosus, transposition complexes, endocardial cushion defects, and univentricular heart. Prevalence of severe and other CHD lesions was determined in 1985, 1990, 1995, and 2000 using population numbers in Quebec. Children were subjects <18 years of age. The prevalence was 4.09 per 1000 adults in the year 2000 for all CHD and 0.38 per 1000 (9%) for those with severe lesions. Female subjects accounted for 57% of the adult CHD population. The median age of all patients with severe CHD was 11 years (interquartile range, 4 to 22 years) in 1985 and 17 years (interquartile range, 10 to 28 years) in 2000 (P<0.0001). The prevalence of severe CHD increased from 1985 to 2000, but the increase in adults was significantly higher than that observed in children. In the year 2000, 49% of those alive with severe CHD were adults. The prevalence in adults and median age of patients with severe CHD increased in the general population from 1985 to 2000. In 2000, there were nearly equal numbers of adults and children with severe CHD.
                Bookmark

                Author and article information

                Contributors
                yueli@cs.mcgill.ca
                manoli@mit.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                21 May 2020
                21 May 2020
                2020
                : 11
                : 2536
                Affiliations
                [1 ]ISNI 0000 0004 1936 8649, GRID grid.14709.3b, School of Computer Science and McGill Centre for Bioinformatics, , McGill University, ; Montreal, Quebec H3A0E9 Canada
                [2 ]Department of Physiology and Biomedical Engineering and Division of Gastroenterology and Hepatology, Department of Medicine, and Center for Individualized Medicine, Mayo Clinic, Rochester, MN USA
                [3 ]ISNI 0000 0004 0459 167X, GRID grid.66875.3a, Department of Health Sciences Research, , Mayo Clinic, ; Rochester, MN USA
                [4 ]ISNI 0000 0004 0459 167X, GRID grid.66875.3a, Department of Psychiatry and Psychology, , Mayo Clinic, ; Rochester, MN USA
                [5 ]McGill Adult Unit for Congenital Heart Disease Excellence (MAUDE Unit), Montreal, QC H4A 3J1 Quebec Canada
                [6 ]ISNI 0000 0001 2341 2786, GRID grid.116068.8, Computer Science and Artificial Intelligence Lab, , Massachusetts Institute of Technology, ; 32 Vassar St, Cambridge, MA 02139 USA
                [7 ]GRID grid.66859.34, The Broad Institute of Harvard and MIT, ; 415 Main Street, Cambridge, MA 02142 USA
                Author information
                http://orcid.org/0000-0002-1163-3634
                http://orcid.org/0000-0002-3940-7284
                http://orcid.org/0000-0001-9350-4440
                http://orcid.org/0000-0003-4944-7789
                http://orcid.org/0000-0001-7113-9630
                Article
                16378
                10.1038/s41467-020-16378-3
                7242436
                32439869
                2c8330be-6a4e-4c8c-b88f-fc1450fcb549
                © The Author(s) 2020

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 23 July 2019
                : 23 April 2020
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100002790, Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (NSERC Canadian Network for Research and Innovation in Machining Technology);
                Award ID: RGPIN-2019-0621
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/501100003151, Fonds de Recherche du Québec - Nature et Technologies (Quebec Fund for Research in Nature and Technology);
                Award ID: NC-268592
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/501100010785, Canada First Research Excellence Fund (Fonds d'excellence en recherche Apogée Canada);
                Award ID: G249591
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/501100000024, Gouvernement du Canada | Canadian Institutes of Health Research (Instituts de Recherche en Santé du Canada);
                Award ID: 35223
                Award Recipient :
                Categories
                Article
                Custom metadata
                © The Author(s) 2020

                Uncategorized
                machine learning,health care,computational science
                Uncategorized
                machine learning, health care, computational science

                Comments

                Comment on this article