101
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks.

          Author Summary

          Text mining and information extraction can be seen as the challenge of converting information hidden in text into manageable data. We have used text mining to automatically extract clinically relevant terms from 5543 psychiatric patient records and map these to disease codes in the International Classification of Disease ontology (ICD10). Mined codes were supplemented by existing coded data. For each patient we constructed a phenotypic profile of associated ICD10 codes. This allowed us to cluster patients together based on the similarity of their profiles. The result is a patient stratification based on more complete profiles than the primary diagnosis, which is typically used. Similarly we investigated comorbidities by looking for pairs of disease codes cooccuring in patients more often than expected. Our high ranking pairs were manually curated by a medical doctor who flagged 93 candidates as interesting. For a number of these we were able to find genes/proteins known to be associated with the diseases using the OMIM database. The disease-associated proteins allowed us to construct protein networks suspected to be involved in each of the phenotypes. Shared proteins between two associated diseases might provide insight to the disease comorbidity.

          Related collections

          Most cited references60

          • Record: found
          • Abstract: found
          • Article: not found

          A human phenome-interactome network of protein complexes implicated in genetic disorders.

          We performed a systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human disease to create a phenome-interactome network. This was done by integrating quality-controlled interactions of human proteins with a validated, computationally derived phenotype similarity score, permitting identification of previously unknown complexes likely to be associated with disease. Using a phenomic ranking of protein complexes linked to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks the known disease-causing protein as the top candidate, and in 870 intervals with no identified disease-causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2 diabetes and coronary heart disease. Our publicly available draft of protein complexes associated with pathology comprises 506 complexes, which reveal functional relationships between disease-promoting genes that will inform future experimentation.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Definition, structure, content, use and impacts of electronic health records: a review of the research literature.

            This paper reviews the research literature on electronic health record (EHR) systems. The aim is to find out (1) how electronic health records are defined, (2) how the structure of these records is described, (3) in what contexts EHRs are used, (4) who has access to EHRs, (5) which data components of the EHRs are used and studied, (6) what is the purpose of research in this field, (7) what methods of data collection have been used in the studies reviewed and (8) what are the results of these studies. A systematic review was carried out of the research dealing with the content of EHRs. A literature search was conducted on four electronic databases: Pubmed/Medline, Cinalh, Eval and Cochrane. The concept of EHR comprised a wide range of information systems, from files compiled in single departments to longitudinal collections of patient data. Only very few papers offered descriptions of the structure of EHRs or the terminologies used. EHRs were used in primary, secondary and tertiary care. Data were recorded in EHRs by different groups of health care professionals. Secretarial staff also recorded data from dictation or nurses' or physicians' manual notes. Some information was also recorded by patients themselves; this information is validated by physicians. It is important that the needs and requirements of different users are taken into account in the future development of information systems. Several data components were documented in EHRs: daily charting, medication administration, physical assessment, admission nursing note, nursing care plan, referral, present complaint (e.g. symptoms), past medical history, life style, physical examination, diagnoses, tests, procedures, treatment, medication, discharge, history, diaries, problems, findings and immunization. In the future it will be necessary to incorporate different kinds of standardized instruments, electronic interviews and nursing documentation systems in EHR systems. The aspects of information quality most often explored in the studies reviewed were the completeness and accuracy of different data components. It has been shown in several studies that the use of an information system was conducive to more complete and accurate documentation by health care professionals. The quality of information is particularly important in patient care, but EHRs also provide important information for secondary purposes, such as health policy planning. Studies focusing on the content of EHRs are needed, especially studies of nursing documentation or patient self-documentation. One future research area is to compare the documentation of different health care professionals with the core information about EHRs which has been determined in national health projects. The challenge for ongoing national health record projects around the world is to take into account all the different types of EHRs and the needs and requirements of different health care professionals and consumers in the development of EHRs. A further challenge is the use of international terminologies in order to achieve semantic interoperability.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A simple algorithm for identifying negated findings and diseases in discharge summaries.

              Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                August 2011
                August 2011
                25 August 2011
                : 7
                : 8
                : e1002141
                Affiliations
                [1 ]Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
                [2 ]NNF Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
                [3 ]Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Roskilde, Denmark
                [4 ]Department of Growth and Reproduction GR, Rigshospitalet, Copenhagen, Denmark
                [5 ]Department of Clinical Biochemistry, Hvidovre Hospital, Copenhagen University Hospital, Hvidovre, Denmark
                [6 ]Psychiatry Region Sealand, Ringsted, Denmark
                Vanderbilt University, United States of America
                Author notes

                Conceived and designed the experiments: F. Roque, P. Jensen, S. Bredkjær, L. Jensen, S. Brunak. Performed the experiments: F. Roque, P. Jensen, H. Schmock, M. Andreatta. Analyzed the data: F. Roque, P. Jensen, H. Schmock, M. Dalgaard, M. Andreatta, T. Hansen, K. Søeby, A. Juul, T. Werge, S. Brunak. Wrote the paper: F. Roque, P. Jensen.

                Article
                PCOMPBIOL-D-11-00196
                10.1371/journal.pcbi.1002141
                3161904
                21901084
                c617cf3a-8a72-4a61-a4b2-4359673ec9be
                Roque et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 11 February 2011
                : 13 June 2011
                Page count
                Pages: 10
                Categories
                Research Article
                Biology
                Computational Biology
                Systems Biology
                Computer Science
                Text Mining
                Medicine
                Diagnostic Medicine
                Mental Health

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article