+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks.

          Author Summary

          Text mining and information extraction can be seen as the challenge of converting information hidden in text into manageable data. We have used text mining to automatically extract clinically relevant terms from 5543 psychiatric patient records and map these to disease codes in the International Classification of Disease ontology (ICD10). Mined codes were supplemented by existing coded data. For each patient we constructed a phenotypic profile of associated ICD10 codes. This allowed us to cluster patients together based on the similarity of their profiles. The result is a patient stratification based on more complete profiles than the primary diagnosis, which is typically used. Similarly we investigated comorbidities by looking for pairs of disease codes cooccuring in patients more often than expected. Our high ranking pairs were manually curated by a medical doctor who flagged 93 candidates as interesting. For a number of these we were able to find genes/proteins known to be associated with the diseases using the OMIM database. The disease-associated proteins allowed us to construct protein networks suspected to be involved in each of the phenotypes. Shared proteins between two associated diseases might provide insight to the disease comorbidity.

          Related collections

          Most cited references 63

          • Record: found
          • Abstract: found
          • Article: not found

          A human phenome-interactome network of protein complexes implicated in genetic disorders.

          We performed a systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human disease to create a phenome-interactome network. This was done by integrating quality-controlled interactions of human proteins with a validated, computationally derived phenotype similarity score, permitting identification of previously unknown complexes likely to be associated with disease. Using a phenomic ranking of protein complexes linked to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks the known disease-causing protein as the top candidate, and in 870 intervals with no identified disease-causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2 diabetes and coronary heart disease. Our publicly available draft of protein complexes associated with pathology comprises 506 complexes, which reveal functional relationships between disease-promoting genes that will inform future experimentation.
            • Record: found
            • Abstract: found
            • Article: not found

            A simple algorithm for identifying negated findings and diseases in discharge summaries.

            Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
              • Record: found
              • Abstract: found
              • Article: not found

              Electronic health records in ambulatory care--a national survey of physicians.

              Electronic health records have the potential to improve the delivery of health care services. However, in the United States, physicians have been slow to adopt such systems. This study assessed physicians' adoption of outpatient electronic health records, their satisfaction with such systems, the perceived effect of the systems on the quality of care, and the perceived barriers to adoption. In late 2007 and early 2008, we conducted a national survey of 2758 physicians, which represented a response rate of 62%. Using a definition for electronic health records that was based on expert consensus, we determined the proportion of physicians who were using such records in an office setting and the relationship between adoption and the characteristics of individual physicians and their practices. Four percent of physicians reported having an extensive, fully functional electronic-records system, and 13% reported having a basic system. In multivariate analyses, primary care physicians and those practicing in large groups, in hospitals or medical centers, and in the western region of the United States were more likely to use electronic health records. Physicians reported positive effects of these systems on several dimensions of quality of care and high levels of satisfaction. Financial barriers were viewed as having the greatest effect on decisions about the adoption of electronic health records. Physicians who use electronic health records believe such systems improve the quality of care and are generally satisfied with the systems. However, as of early 2008, electronic systems had been adopted by only a small minority of U.S. physicians, who may differ from later adopters of these systems. 2008 Massachusetts Medical Society

                Author and article information

                Role: Editor
                PLoS Comput Biol
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                August 2011
                August 2011
                25 August 2011
                : 7
                : 8
                [1 ]Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
                [2 ]NNF Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
                [3 ]Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Roskilde, Denmark
                [4 ]Department of Growth and Reproduction GR, Rigshospitalet, Copenhagen, Denmark
                [5 ]Department of Clinical Biochemistry, Hvidovre Hospital, Copenhagen University Hospital, Hvidovre, Denmark
                [6 ]Psychiatry Region Sealand, Ringsted, Denmark
                Vanderbilt University, United States of America
                Author notes

                Conceived and designed the experiments: F. Roque, P. Jensen, S. Bredkjær, L. Jensen, S. Brunak. Performed the experiments: F. Roque, P. Jensen, H. Schmock, M. Andreatta. Analyzed the data: F. Roque, P. Jensen, H. Schmock, M. Dalgaard, M. Andreatta, T. Hansen, K. Søeby, A. Juul, T. Werge, S. Brunak. Wrote the paper: F. Roque, P. Jensen.

                Roque et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                Page count
                Pages: 10
                Research Article
                Computational Biology
                Systems Biology
                Computer Science
                Text Mining
                Diagnostic Medicine
                Mental Health

                Quantitative & Systems biology


                Comment on this article