Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks.

Author Summary

Text mining and information extraction can be seen as the challenge of converting information hidden in text into manageable data. We have used text mining to automatically extract clinically relevant terms from 5543 psychiatric patient records and map these to disease codes in the International Classification of Disease ontology (ICD10). Mined codes were supplemented by existing coded data. For each patient we constructed a phenotypic profile of associated ICD10 codes. This allowed us to cluster patients together based on the similarity of their profiles. The result is a patient stratification based on more complete profiles than the primary diagnosis, which is typically used. Similarly we investigated comorbidities by looking for pairs of disease codes cooccuring in patients more often than expected. Our high ranking pairs were manually curated by a medical doctor who flagged 93 candidates as interesting. For a number of these we were able to find genes/proteins known to be associated with the diseases using the OMIM database. The disease-associated proteins allowed us to construct protein networks suspected to be involved in each of the phenotypes. Shared proteins between two associated diseases might provide insight to the disease comorbidity.

Related collections

Most cited references 60

Record: found
Abstract: found
Article: not found

A human phenome-interactome network of protein complexes implicated in genetic disorders.

Kasper Lage, E Olof Karlberg, Zenia Størling … (2007)

We performed a systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human disease to create a phenome-interactome network. This was done by integrating quality-controlled interactions of human proteins with a validated, computationally derived phenotype similarity score, permitting identification of previously unknown complexes likely to be associated with disease. Using a phenomic ranking of protein complexes linked to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks the known disease-causing protein as the top candidate, and in 870 intervals with no identified disease-causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2 diabetes and coronary heart disease. Our publicly available draft of protein complexes associated with pathology comprises 506 complexes, which reveal functional relationships between disease-promoting genes that will inform future experimentation.

0 comments Cited 318 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Definition, structure, content, use and impacts of electronic health records: a review of the research literature.

Kristiina Häyrinen, Kaija Saranto, Pirkko Nykänen (2008)

This paper reviews the research literature on electronic health record (EHR) systems. The aim is to find out (1) how electronic health records are defined, (2) how the structure of these records is described, (3) in what contexts EHRs are used, (4) who has access to EHRs, (5) which data components of the EHRs are used and studied, (6) what is the purpose of research in this field, (7) what methods of data collection have been used in the studies reviewed and (8) what are the results of these studies. A systematic review was carried out of the research dealing with the content of EHRs. A literature search was conducted on four electronic databases: Pubmed/Medline, Cinalh, Eval and Cochrane. The concept of EHR comprised a wide range of information systems, from files compiled in single departments to longitudinal collections of patient data. Only very few papers offered descriptions of the structure of EHRs or the terminologies used. EHRs were used in primary, secondary and tertiary care. Data were recorded in EHRs by different groups of health care professionals. Secretarial staff also recorded data from dictation or nurses' or physicians' manual notes. Some information was also recorded by patients themselves; this information is validated by physicians. It is important that the needs and requirements of different users are taken into account in the future development of information systems. Several data components were documented in EHRs: daily charting, medication administration, physical assessment, admission nursing note, nursing care plan, referral, present complaint (e.g. symptoms), past medical history, life style, physical examination, diagnoses, tests, procedures, treatment, medication, discharge, history, diaries, problems, findings and immunization. In the future it will be necessary to incorporate different kinds of standardized instruments, electronic interviews and nursing documentation systems in EHR systems. The aspects of information quality most often explored in the studies reviewed were the completeness and accuracy of different data components. It has been shown in several studies that the use of an information system was conducive to more complete and accurate documentation by health care professionals. The quality of information is particularly important in patient care, but EHRs also provide important information for secondary purposes, such as health policy planning. Studies focusing on the content of EHRs are needed, especially studies of nursing documentation or patient self-documentation. One future research area is to compare the documentation of different health care professionals with the core information about EHRs which has been determined in national health projects. The challenge for ongoing national health record projects around the world is to take into account all the different types of EHRs and the needs and requirements of different health care professionals and consumers in the development of EHRs. A further challenge is the use of international terminologies in order to achieve semantic interoperability.

0 comments Cited 249 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A simple algorithm for identifying negated findings and diseases in discharge summaries.

Wendy W. Chapman, Will Bridewell, Paul Hanbury … (2001)

Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.

0 comments Cited 237 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Comput Biol

Journal ID (publisher-id): plos

Journal ID (pmc): ploscomp

Title: PLoS Computational Biology

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-734X

ISSN (Electronic): 1553-7358

Publication date Collection: August 2011

Publication date (Print): August 2011

Publication date (Electronic): 25 August 2011

Volume: 7

Issue: 8

Electronic Location Identifier: e1002141

Affiliations

[1 ]Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark

[2 ]NNF Center for Protein Research, University of Copenhagen, Copenhagen, Denmark

[3 ]Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Roskilde, Denmark

[4 ]Department of Growth and Reproduction GR, Rigshospitalet, Copenhagen, Denmark

[5 ]Department of Clinical Biochemistry, Hvidovre Hospital, Copenhagen University Hospital, Hvidovre, Denmark

[6 ]Psychiatry Region Sealand, Ringsted, Denmark

Vanderbilt University, United States of America

Author notes

* E-mail: brunak@ 123456cbs.dtu.dk

Conceived and designed the experiments: F. Roque, P. Jensen, S. Bredkjær, L. Jensen, S. Brunak. Performed the experiments: F. Roque, P. Jensen, H. Schmock, M. Andreatta. Analyzed the data: F. Roque, P. Jensen, H. Schmock, M. Dalgaard, M. Andreatta, T. Hansen, K. Søeby, A. Juul, T. Werge, S. Brunak. Wrote the paper: F. Roque, P. Jensen.

Article

Publisher ID: PCOMPBIOL-D-11-00196

DOI: 10.1371/journal.pcbi.1002141

PMC ID: 3161904

PubMed ID: 21901084

SO-VID: c617cf3a-8a72-4a61-a4b2-4359673ec9be

Copyright © Roque et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 11 February 2011

Date accepted : 13 June 2011

Page count

Pages: 10

Comments

Comment on this article

scite_

Cited by 82

See all cited by

Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts

Read this article at

Abstract

Author Summary

Related collections

iDR24 Conference - Poster Abstracts

Most cited references 60

A human phenome-interactome network of protein complexes implicated in genetic disorders.

Definition, structure, content, use and impacts of electronic health records: a review of the research literature.

A simple algorithm for identifying negated findings and diseases in discharge summaries.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 19

Cited by 82