45
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies

      research-article
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.

          Author Summary

          Tests of association with disease status are normally conducted one SNP at a time, ignoring the effects of all other genotyped SNPs. We developed a computationally efficient method to simultaneously analyse all SNPs, either in a genome-wide association (GWA) study, or a fine-mapping study based on re-sequencing and/or imputation. The method selects a subset of SNPs that best predicts disease status, while controlling the type-I error of the selected SNPs. This brings many advantages over standard single-SNP approaches, because the signal from a particular SNP can be more clearly assessed when other SNPs associated with disease status are already included in the model. Thus, in comparison with single-SNP analyses, power is increased and the false positive rate is reduced because of reduced residual variation. Localisation is also greatly improved. We demonstrate these advantages over the widely used single-SNP Armitage Trend Test using GWA simulation studies, a real GWA dataset, and a sequence-based fine-mapping simulation study.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: not found

          A genome-wide association study identifies novel risk loci for type 2 diabetes.

          Type 2 diabetes mellitus results from the interaction of environmental factors with a combination of genetic variants, most of which were hitherto unknown. A systematic search for these variants was recently made possible by the development of high-density arrays that permit the genotyping of hundreds of thousands of polymorphisms. We tested 392,935 single-nucleotide polymorphisms in a French case-control cohort. Markers with the most significant difference in genotype frequencies between cases of type 2 diabetes and controls were fast-tracked for testing in a second cohort. This identified four loci containing variants that confer type 2 diabetes risk, in addition to confirming the known association with the TCF7L2 gene. These loci include a non-synonymous polymorphism in the zinc transporter SLC30A8, which is expressed exclusively in insulin-producing beta-cells, and two linkage disequilibrium blocks that contain genes potentially involved in beta-cell development or function (IDE-KIF11-HHEX and EXT2-ALX4). These associations explain a substantial portion of disease risk and constitute proof of principle for the genome-wide approach to the elucidation of complex genetic traits.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Calibrating a coalescent simulation of human genome sequence variation.

            Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a "null" distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Heuristics of instability and stabilization in model selection

                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                July 2008
                July 2008
                25 July 2008
                : 4
                : 7
                : e1000130
                Affiliations
                [1 ]Department of Epidemiology and Public Health, Imperial College, London, United Kingdom
                [2 ]Non-Communicable Disease Epidemiology Unit, London School of Hygiene and Tropical Medicine, London, United Kingdom
                Queensland Institute of Medical Research, Australia
                Author notes

                Conceived and designed the experiments: CJH JCW MDI DJB. Performed the experiments: CJH. Analyzed the data: CJH. Wrote the paper: CJH DJB.

                Article
                08-PLGE-RA-0259R2
                10.1371/journal.pgen.1000130
                2464715
                18654633
                e767323a-7887-483e-88ee-6cb9531bc3b8
                Hoggart et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 5 March 2008
                : 17 June 2008
                Page count
                Pages: 8
                Categories
                Research Article
                Genetics and Genomics/Genetics of Disease
                Genetics and Genomics/Population Genetics
                Mathematics/Statistics

                Genetics
                Genetics

                Comments

                Comment on this article