Predicting early psychiatric readmission with natural language processing of narrative discharge summaries

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The ability to predict psychiatric readmission would facilitate the development of interventions to reduce this risk, a major driver of psychiatric health-care costs. The symptoms or characteristics of illness course necessary to develop reliable predictors are not available in coded billing data, but may be present in narrative electronic health record (EHR) discharge summaries. We identified a cohort of individuals admitted to a psychiatric inpatient unit between 1994 and 2012 with a principal diagnosis of major depressive disorder, and extracted inpatient psychiatric discharge narrative notes. Using these data, we trained a 75-topic Latent Dirichlet Allocation (LDA) model, a form of natural language processing, which identifies groups of words associated with topics discussed in a document collection. The cohort was randomly split to derive a training (70%) and testing (30%) data set, and we trained separate support vector machine models for baseline clinical features alone, baseline features plus common individual words and the above plus topics identified from the 75-topic LDA model. Of 4687 patients with inpatient discharge summaries, 470 were readmitted within 30 days. The 75-topic LDA model included topics linked to psychiatric symptoms (suicide, severe depression, anxiety, trauma, eating/weight and panic) and major depressive disorder comorbidities (infection, postpartum, brain tumor, diarrhea and pulmonary disease). By including LDA topics, prediction of readmission, as measured by area under receiver-operating characteristic curves in the testing data set, was improved from baseline (area under the curve 0.618) to baseline+1000 words (0.682) to baseline+75 topics (0.784). Inclusion of topics derived from narrative notes allows more accurate discrimination of individuals at high risk for psychiatric readmission in this cohort. Topic modeling and related approaches offer the potential to improve prediction using EHRs, if generalizability can be established in other clinical cohorts.

Related collections

Most cited references 12

Record: found
Abstract: found
Article: not found

Finding scientific topics.

Thomas L Griffiths, Mark Steyvers (2004)

A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying "hot topics" by examining temporal dynamics and tagging abstracts to illustrate semantic content.

0 comments Cited 802 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2).

Shawn N Murphy, Griffin Weber, Michael Mendis … (2010)

Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing (http://www.ncbcs.org). Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases ("data marts") can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL: http://www.i2b2.org/software.

0 comments Cited 304 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Automated identification of postoperative complications within an electronic medical record using natural language processing.

P Elkin, Theodore Speroff, Emery Brown … (2011)

Currently most automated methods to identify patient safety occurrences rely on administrative data codes; however, free-text searches of electronic medical records could represent an additional surveillance approach. To evaluate a natural language processing search-approach to identify postoperative surgical complications within a comprehensive electronic medical record. Cross-sectional study involving 2974 patients undergoing inpatient surgical procedures at 6 Veterans Health Administration (VHA) medical centers from 1999 to 2006. Postoperative occurrences of acute renal failure requiring dialysis, deep vein thrombosis, pulmonary embolism, sepsis, pneumonia, or myocardial infarction identified through medical record review as part of the VA Surgical Quality Improvement Program. We determined the sensitivity and specificity of the natural language processing approach to identify these complications and compared its performance with patient safety indicators that use discharge coding information. The proportion of postoperative events for each sample was 2% (39 of 1924) for acute renal failure requiring dialysis, 0.7% (18 of 2327) for pulmonary embolism, 1% (29 of 2327) for deep vein thrombosis, 7% (61 of 866) for sepsis, 16% (222 of 1405) for pneumonia, and 2% (35 of 1822) for myocardial infarction. Natural language processing correctly identified 82% (95% confidence interval [CI], 67%-91%) of acute renal failure cases compared with 38% (95% CI, 25%-54%) for patient safety indicators. Similar results were obtained for venous thromboembolism (59%, 95% CI, 44%-72% vs 46%, 95% CI, 32%-60%), pneumonia (64%, 95% CI, 58%-70% vs 5%, 95% CI, 3%-9%), sepsis (89%, 95% CI, 78%-94% vs 34%, 95% CI, 24%-47%), and postoperative myocardial infarction (91%, 95% CI, 78%-97%) vs 89%, 95% CI, 74%-96%). Both natural language processing and patient safety indicators were highly specific for these diagnoses. Among patients undergoing inpatient surgical procedures at VA medical centers, natural language processing analysis of electronic medical records to identify postoperative complications had higher sensitivity and lower specificity compared with patient safety indicators based on discharge coding.

0 comments Cited 126 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Transl Psychiatry

Journal ID (iso-abbrev): Transl Psychiatry

Title: Translational Psychiatry

Publisher: Nature Publishing Group

ISSN (Electronic): 2158-3188

Publication date (Print): October 2016

Publication date (Electronic): 18 October 2016

Publication date PMC-release: 1 October 2016

Volume: 6

Issue: 10

Page: e921

Affiliations

[1 ]MIT Computer Science and Artificial Intelligence Laboratory , Cambridge, MA, USA

[2 ]Department of Computer Science, University of Massachusetts Lowell , Lowell, MA, USA

[3 ]Center for Experimental Drugs and Diagnostics, Massachusetts General Hospital , Boston, MA, USA

[4 ]Department of Psychiatry, Massachusetts General Hospital , Boston, MA, USA

[5 ]Center for Human Genetic Research, Massachusetts General Hospital , Boston, MA, USA

[6 ]Partners Research Information Systems and Computing, Partners HealthCare System , Boston, MA, USA

Author notes

[* ]Department of Psychiatry, Massachusetts General Hospital , Simches Research Building, 185 Cambridge Street, 6th Floor, Boston, MA 02114, USA. E-mail: rperlis@ 123456partners.org

Article

Publisher Item ID: tp2015182

DOI: 10.1038/tp.2015.182

PMC ID: 5315537

PubMed ID: 27754482

SO-VID: b204cfcb-6043-4853-8bec-47d0ed66cf8f

License:

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/

History

Date received : 01 May 2015

Date revision received : 14 August 2015

Date accepted : 06 September 2015

Comments

Comment on this article

scite_

Cited by 53

See all cited by

Most referenced authors 239

See all reference authors

Predicting early psychiatric readmission with natural language processing of narrative discharge summaries

Read this article at

Abstract

Related collections

Radiology and Natural Language Processing

Most cited references 12

Finding scientific topics.

Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2).

Automated identification of postoperative complications within an electronic medical record using natural language processing.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 130

Cited by 53

Most referenced authors 239