A second generation human haplotype map of over 3.1 million SNPs

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

Related collections

Most cited references 57

Record: found
Abstract: found
Article: not found

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Shaun Purcell, Benjamin M. Neale, Kathe Todd-Brown … (2007)

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

0 comments Cited 5191 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Principal components analysis corrects for stratification in genome-wide association studies.

Alkes L. Price, Nick Patterson, Robert Plenge … (2006)

Population stratification--allele frequency differences between cases and controls due to systematic ancestry differences-can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker's variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers.

0 comments Cited 1275 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Global variation in copy number in the human genome.

Richard Redon, Shumpei Ishikawa, Karen R Fitch … (2006)

Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

0 comments Cited 1208 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Nature

Abbreviated Title: Nature

Publisher: Springer Science and Business Media LLC

ISSN (Print): 0028-0836

ISSN (Electronic): 1476-4687

Publication date Created: October 2007

Publication date (Print): October 2007

Volume: 449

Issue: 7164

Pages: 851-861

Article

DOI: 10.1038/nature06258

PMC ID: 2689609

PubMed ID: 17943122

SO-VID: 5b9a67ff-8763-4f04-a2ea-0b889ab4a9a1

License:

http://www.springer.com/tdm

History

Data availability:

Comments

Comment on this article

scite_

Cited by 1,328

See all cited by

- Version 1
- Version 1

A second generation human haplotype map of over 3.1 million SNPs

Read this article at

Abstract

Related collections

Global Health Next Generation Network

Most cited references 57

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Principal components analysis corrects for stratification in genome-wide association studies.

Global variation in copy number in the human genome.

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 3,761

Cited by 1,328