Population Structure and Eigenanalysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general “phase change” phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like F _ST ) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.

Synopsis

When analyzing genetic data, one often wishes to determine if the samples are from a population that has structure. Can the samples be regarded as randomly chosen from a homogeneous population, or does the data imply that the population is not genetically homogeneous? Patterson, Price, and Reich show that an old method (principal components) together with modern statistics (Tracy–Widom theory) can be combined to yield a fast and effective answer to this question. The technique is simple and practical on the largest datasets, and can be applied both to genetic markers that are biallelic and to markers that are highly polymorphic such as microsatellites. The theory also allows the authors to estimate the data size needed to detect structure if their samples are in fact from two populations that have a given, but small level of differentiation.

Related collections

Most cited references 35

Record: found
Abstract: found
Article: not found

High-resolution haplotype structure in the human genome.

M Daly, J Rioux, S. Schaffner … (2001)

Linkage disequilibrium (LD) analysis is traditionally based on individual genetic markers and often yields an erratic, non-monotonic picture, because the power to detect allelic associations depends on specific properties of each marker, such as frequency and population history. Ideally, LD analysis should be based directly on the underlying haplotype structure of the human genome, but this structure has remained poorly understood. Here we report a high-resolution analysis of the haplotype structure across 500 kilobases on chromosome 5q31 using 103 single-nucleotide polymorphisms (SNPs) in a European-derived population. The results show a picture of discrete haplotype blocks (of tens to hundreds of kilobases), each with limited diversity punctuated by apparent sites of recombination. In addition, we develop an analytical model for LD mapping based on such haplotype blocks. If our observed structure is general (and published data suggest that it may be), it offers a coherent framework for creating a haplotype map of the human genome.

0 comments Cited 245 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Calibrating a coalescent simulation of human genome sequence variation.

Stephen F. Schaffner, Catherine Foo, Stacey Gabriel … (2005)

Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a "null" distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.

0 comments Cited 224 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Assessing the impact of population stratification on genetic association studies.

Matthew Freedman, David Reich, Kathryn Penney … (2004)

Population stratification refers to differences in allele frequencies between cases and controls due to systematic differences in ancestry rather than association of genes with disease. It has been proposed that false positive associations due to stratification can be controlled by genotyping a few dozen unlinked genetic markers. To assess stratification empirically, we analyzed data from 11 case-control and case-cohort association studies. We did not detect statistically significant evidence for stratification but did observe that assessments based on a few dozen markers lack power to rule out moderate levels of stratification that could cause false positive associations in studies designed to detect modest genetic risk factors. After increasing the number of markers and samples in a case-cohort study (the design most immune to stratification), we found that stratification was in fact present. Our results suggest that modest amounts of stratification can exist even in well designed studies.

0 comments Cited 214 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Genet

Journal ID (publisher-id): pgen

Title: PLoS Genetics

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-7390

ISSN (Electronic): 1553-7404

Publication date (Print): December 2006

Publication date (Electronic): 22 December 2006

Volume: 2

Issue: 12

Electronic Location Identifier: e190

Affiliations

[1 ] Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America

[2 ] Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America

University of Alabama at Birmingham, United States of America

Author notes

* To whom correspondence should be addressed. E-mail: nickp@ 123456broad.mit.edu

Article

Publisher ID: 06-PLGE-RA-0101R3 Serial Item and Contribution ID: plge-02-12-13

DOI: 10.1371/journal.pgen.0020190

PMC ID: 1713260

PubMed ID: 17194218

SO-VID: 17a7d1b4-0e1d-4420-b478-284eb1f28841

Copyright © Copyright: © 2006 Patterson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 23 March 2006

Date accepted : 27 September 2006

Page count

Pages: 20

Custom metadata

citation Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12): e190. doi: 10.1371/journal.pgen.0020190

ScienceOpen disciplines: Genetics

Data availability:

ScienceOpen disciplines: Genetics

Comments

Comment on this article

scite_

Cited by 1,964

See all cited by

Most referenced authors 868

See all reference authors

- Version 1
- Version 1

Population Structure and Eigenanalysis

Read this article at

Abstract

Synopsis

Related collections

The Journal of Population and Sustainability

Most cited references 35

High-resolution haplotype structure in the human genome.

Calibrating a coalescent simulation of human genome sequence variation.

Assessing the impact of population stratification on genetic association studies.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 17

Cited by 1,964

Most referenced authors 868