+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Motivation: Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease–gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European–Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP–disease associations for these SNPs.

          Results: Four of seven known SNP–disease associations using the PheWAS algorithm were replicated with P-values between 2.8 × 10 −6 and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P < 0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP–disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance.

          Availability:The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research.

          Contact: josh.denny@ 123456vanderbilt.edu

          Related collections

          Most cited references 18

          • Record: found
          • Abstract: found
          • Article: not found

          Risk alleles for multiple sclerosis identified by a genomewide study.

          Multiple sclerosis has a clinically significant heritable component. We conducted a genomewide association study to identify alleles associated with the risk of multiple sclerosis. We used DNA microarray technology to identify common DNA sequence variants in 931 family trios (consisting of an affected child and both parents) and tested them for association. For replication, we genotyped another 609 family trios, 2322 case subjects, and 789 control subjects and used genotyping data from two external control data sets. A joint analysis of data from 12,360 subjects was performed to estimate the overall significance and effect size of associations between alleles and the risk of multiple sclerosis. A transmission disequilibrium test of 334,923 single-nucleotide polymorphisms (SNPs) in 931 family trios revealed 49 SNPs having an association with multiple sclerosis (P<1x10(-4)); of these SNPs, 38 were selected for the second-stage analysis. A comparison between the 931 case subjects from the family trios and 2431 control subjects identified an additional nonoverlapping 32 SNPs (P<0.001). An additional 40 SNPs with less stringent P values (<0.01) were also selected, for a total of 110 SNPs for the second-stage analysis. Of these SNPs, two within the interleukin-2 receptor alpha gene (IL2RA) were strongly associated with multiple sclerosis (P=2.96x10(-8)), as were a nonsynonymous SNP in the interleukin-7 receptor alpha gene (IL7RA) (P=2.94x10(-7)) and multiple SNPs in the HLA-DRA locus (P=8.94x10(-81)). Alleles of IL2RA and IL7RA and those in the HLA locus are identified as heritable risk factors for multiple sclerosis. Copyright 2007 Massachusetts Medical Society.
            • Record: found
            • Abstract: found
            • Article: not found

            Variants conferring risk of atrial fibrillation on chromosome 4q25.

            Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia in humans and is characterized by chaotic electrical activity of the atria. It affects one in ten individuals over the age of 80 years, causes significant morbidity and is an independent predictor of mortality. Recent studies have provided evidence of a genetic contribution to AF. Mutations in potassium-channel genes have been associated with familial AF but account for only a small fraction of all cases of AF. We have performed a genome-wide association scan, followed by replication studies in three populations of European descent and a Chinese population from Hong Kong and find a strong association between two sequence variants on chromosome 4q25 and AF. Here we show that about 35% of individuals of European descent have at least one of the variants and that the risk of AF increases by 1.72 and 1.39 per copy. The association with the stronger variant is replicated in the Chinese population, where it is carried by 75% of individuals and the risk of AF is increased by 1.42 per copy. A stronger association was observed in individuals with typical atrial flutter. Both variants are adjacent to PITX2, which is known to have a critical function in left-right asymmetry of the heart.
              • Record: found
              • Abstract: found
              • Article: not found

              Accuracy of Medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value on the basis of review of hospital records.

              Many cardiovascular epidemiologic studies rely on diagnosis codes in health care claims databases. Despite important changes in the care and diagnosis of acute myocardial infarction (AMI), the validity of hospital discharge diagnosis codes for AMI in the US Medicare system has not been recently examined. Our objective was to examine the accuracy of International Classification of Diseases--ninth revision--Clinical Modifications (ICD-9-CM) discharge diagnosis codes and diagnosis-related groups (DRG) codes for AMI in a Medicare claims database. We sampled hospitalization episodes from Medicare beneficiaries in Pennsylvania during 1999, 2000, or both. We used Medicare data to identify patients with hospitalizations containing indicators of AMI (ICD-9-CM diagnosis codes 410.X0 and 410.X1 or DRG codes 121, 122, and 123). Hospital records for these episodes were reviewed by trained abstractors using World Health Organization criteria for diagnosing AMI. We then calculated the positive predictive value of Medicare claims-based definitions of AMI. Of 2200 hospitalization episodes with Medicare diagnosis codes suggestive of AMI, 2022 hospital records (91.9%) were obtained. The positive predictive value for a primary Medicare claims-based definition was 94.1% (95% CI, 93.0%-95.2%). Positive predictive values for alternative claims-based definitions ranged slightly, with the definition including DRG codes and length-of-stay restrictions yielding the highest positive predictive value, 95.4% (95% CI, 94.3%-96.4%). Subjects with a history of myocardial infarction had a significantly lower positive predictive value than subjects without a history of myocardial infarction (88.1% vs 94.6%, P <.001). In this study, we observed high positive predictive values for a Medicare claims-based diagnosis of AMI and a diagnosis based on structured hospital record review.

                Author and article information

                Oxford University Press
                1 May 2010
                24 March 2010
                24 March 2010
                : 26
                : 9
                : 1205-1210
                1 Department of Biomedical Informatics, 2 Department of Medicine, Vanderbilt University and 3 Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University School of Medicine, Nashville, TN, USA
                Author notes
                * To whom correspondence should be addressed.

                Associate Editor: Jeffery Barrett

                © The Author(s) 2010. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Original Papers
                Genetics and Population Analysis

                Bioinformatics & Computational biology


                Comment on this article