Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.

Author Summary

Tests of association with disease status are normally conducted one SNP at a time, ignoring the effects of all other genotyped SNPs. We developed a computationally efficient method to simultaneously analyse all SNPs, either in a genome-wide association (GWA) study, or a fine-mapping study based on re-sequencing and/or imputation. The method selects a subset of SNPs that best predicts disease status, while controlling the type-I error of the selected SNPs. This brings many advantages over standard single-SNP approaches, because the signal from a particular SNP can be more clearly assessed when other SNPs associated with disease status are already included in the model. Thus, in comparison with single-SNP analyses, power is increased and the false positive rate is reduced because of reduced residual variation. Localisation is also greatly improved. We demonstrate these advantages over the widely used single-SNP Armitage Trend Test using GWA simulation studies, a real GWA dataset, and a sequence-based fine-mapping simulation study.

Related collections

Most cited references 29

Record: found
Abstract: found
Article: not found

A genome-wide association study identifies novel risk loci for type 2 diabetes.

Robert Sladek, Ghislain Rocheleau, Johan Rung … (2007)

Type 2 diabetes mellitus results from the interaction of environmental factors with a combination of genetic variants, most of which were hitherto unknown. A systematic search for these variants was recently made possible by the development of high-density arrays that permit the genotyping of hundreds of thousands of polymorphisms. We tested 392,935 single-nucleotide polymorphisms in a French case-control cohort. Markers with the most significant difference in genotype frequencies between cases of type 2 diabetes and controls were fast-tracked for testing in a second cohort. This identified four loci containing variants that confer type 2 diabetes risk, in addition to confirming the known association with the TCF7L2 gene. These loci include a non-synonymous polymorphism in the zinc transporter SLC30A8, which is expressed exclusively in insulin-producing beta-cells, and two linkage disequilibrium blocks that contain genes potentially involved in beta-cell development or function (IDE-KIF11-HHEX and EXT2-ALX4). These associations explain a substantial portion of disease risk and constitute proof of principle for the genome-wide approach to the elucidation of complex genetic traits.

0 comments Cited 759 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Calibrating a coalescent simulation of human genome sequence variation.

Stephen F. Schaffner, Catherine Foo, Stacey Gabriel … (2005)

Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a "null" distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.

0 comments Cited 224 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Heuristics of instability and stabilization in model selection

Leo Breiman (1996)

0 comments Cited 191 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Genet

Journal ID (publisher-id): plos

Journal ID (pmc): plosgen

Title: PLoS Genetics

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-7390

ISSN (Electronic): 1553-7404

Publication date Collection: July 2008

Publication date (Print): July 2008

Publication date (Electronic): 25 July 2008

Volume: 4

Issue: 7

Electronic Location Identifier: e1000130

Affiliations

[1 ]Department of Epidemiology and Public Health, Imperial College, London, United Kingdom

[2 ]Non-Communicable Disease Epidemiology Unit, London School of Hygiene and Tropical Medicine, London, United Kingdom

Queensland Institute of Medical Research, Australia

Author notes

* E-mail: c.hoggart@ 123456ic.ac.uk

Conceived and designed the experiments: CJH JCW MDI DJB. Performed the experiments: CJH. Analyzed the data: CJH. Wrote the paper: CJH DJB.

Article

Publisher ID: 08-PLGE-RA-0259R2

DOI: 10.1371/journal.pgen.1000130

PMC ID: 2464715

PubMed ID: 18654633

SO-VID: e767323a-7887-483e-88ee-6cb9531bc3b8

Copyright © Hoggart et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 5 March 2008

Date accepted : 17 June 2008

Page count

Pages: 8

Comments

Comment on this article

scite_

Cited by 138

See all cited by

Most referenced authors 432

See all reference authors

Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies

Read this article at

Abstract

Author Summary

Related collections

Genome Integrity

Most cited references 29

A genome-wide association study identifies novel risk loci for type 2 diabetes.

Calibrating a coalescent simulation of human genome sequence variation.

Heuristics of instability and stabilization in model selection

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 61

Cited by 138

Most referenced authors 432