37
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Chapter 13: Mining Electronic Health Records in the Genomics Era

      *

      ,

      PLoS Computational Biology

      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Abstract: The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped individuals. This chapter reviews several examples of phenotype extraction and their application to genetic research, demonstrating a viable future for genomic discovery using EHR-linked data.

          Related collections

          Most cited references 54

          • Record: found
          • Abstract: found
          • Article: not found

          Comorbidity measures for use with administrative data.

          This study attempts to develop a comprehensive set of comorbidity measures for use with large administrative inpatient datasets. The study involved clinical and empirical review of comorbidity measures, development of a framework that attempts to segregate comorbidities from other aspects of the patient's condition, development of a comorbidity algorithm, and testing on heterogeneous and homogeneous patient groups. Data were drawn from all adult, nonmaternal inpatients from 438 acute care hospitals in California in 1992 (n = 1,779,167). Outcome measures were those commonly available in administrative data: length of stay, hospital charges, and in-hospital death. A comprehensive set of 30 comorbidity measures was developed. The comorbidities were associated with substantial increases in length of stay, hospital charges, and mortality both for heterogeneous and homogeneous disease groups. Several comorbidities are described that are important predictors of outcomes, yet commonly are not measured. These include mental disorders, drug and alcohol abuse, obesity, coagulopathy, weight loss, and fluid and electrolyte disorders. The comorbidities had independent effects on outcomes and probably should not be simplified as an index because they affect outcomes differently among different patient groups. The present method addresses some of the limitations of previous measures. It is based on a comprehensive approach to identifying comorbidities and separates them from the primary reason for hospitalization, resulting in an expanded set of comorbidities that easily is applied without further refinement to administrative data for a wide range of diseases.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Population stratification and spurious allelic association.

            Great efforts and expense have been expended in attempts to detect genetic polymorphisms contributing to susceptibility to complex human disease. Concomitantly, technology for detection and scoring of single nucleotide polymorphisms (SNPs) has undergone rapid development, extensive catalogues of SNPs across the genome have been constructed, and SNPs have been increasingly used as a means for investigation of the genetic causes of complex human diseases. For many diseases, population-based studies of unrelated individuals--in which case-control and cohort studies serve as standard designs for genetic association analysis--can be the most practical and powerful approach. However, extensive debate has arisen about optimum study design, and considerable concern has been expressed that these approaches are prone to population stratification, which can lead to biased or spurious results. Over the past decade, a great shift has been noted, away from case-control and cohort studies, towards family-based association designs. These designs have fewer problems with population stratification but have greater genotyping and sampling requirements, and data can be difficult or impossible to gather. We discuss past evidence for population stratification on genotype-phenotype association studies, review methods to detect and account for it, and present suggestions for future study design and analysis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Association of three genetic loci with uric acid concentration and risk of gout: a genome-wide association study.

              Hyperuricaemia, a highly heritable trait, is a key risk factor for gout. We aimed to identify novel genes associated with serum uric acid concentration and gout. Genome-wide association studies were done for serum uric acid in 7699 participants in the Framingham cohort and in 4148 participants in the Rotterdam cohort. Genome-wide significant single nucleotide polymorphisms (SNPs) were replicated in white (n=11 024) and black (n=3843) individuals who took part in the study of Atherosclerosis Risk in Communities (ARIC). The SNPs that reached genome-wide significant association with uric acid in either the Framingham cohort (p<5.0 x 10(-8)) or the Rotterdam cohort (p<1.0 x 10(-7)) were evaluated with gout. The results obtained in white participants were combined using meta-analysis. Three loci in the Framingham cohort and two in the Rotterdam cohort showed genome-wide association with uric acid. Top SNPs in each locus were: missense rs16890979 in SLC2A9 (p=7.0 x 10(-168) and 2.9 x 10(-18) for white and black participants, respectively); missense rs2231142 in ABCG2 (p=2.5 x 10(-60) and 9.8 x 10(-4)), and rs1165205 in SLC17A3 (p=3.3 x 10(-26) and 0.33). All SNPs were direction-consistent with gout in white participants: rs16890979 (OR 0.59 per T allele, 95% CI 0.52-0.68, p=7.0 x 10(-14)), rs2231142 (1.74, 1.51-1.99, p=3.3 x 10(-15)), and rs1165205 (0.85, 0.77-0.94, p=0.002). In black participants of the ARIC study, rs2231142 was direction-consistent with gout (1.71, 1.06-2.77, p=0.028). An additive genetic risk score of high-risk alleles at the three loci showed graded associations with uric acid (272-351 mumol/L in the Framingham cohort, 269-386 mumol/L in the Rotterdam cohort, and 303-426 mumol/L in white participants of the ARIC study) and gout (frequency 2-13% in the Framingham cohort, 2-8% in the Rotterdam cohort, and 1-18% in white participants in the ARIC study). We identified three genetic loci associated with uric acid concentration and gout. A score based on genes with a putative role in renal urate handling showed a substantial risk for gout.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                December 2012
                December 2012
                27 December 2012
                : 8
                : 12
                Affiliations
                Departments of Biomedical Informatics and Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
                Whitehead Institute, United States of America
                University of Maryland, Baltimore County, United States of America
                Author notes

                The author has declared that no competing interests exist.

                Article
                PCOMPBIOL-D-12-01458
                10.1371/journal.pcbi.1002823
                3531280
                23300414

                Denny. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                Page count
                Pages: 15
                Funding
                This article was supported in part by grants from the National Library of Medicine R01 LM 010685 and the National Human Genome Research Institute U01 HG004603. The funders had no role in the preparation of the manuscript.
                Categories
                Education
                Biology
                Genomics
                Medicine
                Epidemiology
                Disease Informatics
                Non-Clinical Medicine
                Health Informatics

                Quantitative & Systems biology

                Comments

                Comment on this article