322
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Population Structure and Eigenanalysis

      research-article
      1 , * , 1 , 2 , 1 , 2
      PLoS Genetics
      Public Library of Science

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general “phase change” phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like F ST ) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.

          Synopsis

          When analyzing genetic data, one often wishes to determine if the samples are from a population that has structure. Can the samples be regarded as randomly chosen from a homogeneous population, or does the data imply that the population is not genetically homogeneous? Patterson, Price, and Reich show that an old method (principal components) together with modern statistics (Tracy–Widom theory) can be combined to yield a fast and effective answer to this question. The technique is simple and practical on the largest datasets, and can be applied both to genetic markers that are biallelic and to markers that are highly polymorphic such as microsatellites. The theory also allows the authors to estimate the data size needed to detect structure if their samples are in fact from two populations that have a given, but small level of differentiation.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: found
          • Article: not found

          High-resolution haplotype structure in the human genome.

          Linkage disequilibrium (LD) analysis is traditionally based on individual genetic markers and often yields an erratic, non-monotonic picture, because the power to detect allelic associations depends on specific properties of each marker, such as frequency and population history. Ideally, LD analysis should be based directly on the underlying haplotype structure of the human genome, but this structure has remained poorly understood. Here we report a high-resolution analysis of the haplotype structure across 500 kilobases on chromosome 5q31 using 103 single-nucleotide polymorphisms (SNPs) in a European-derived population. The results show a picture of discrete haplotype blocks (of tens to hundreds of kilobases), each with limited diversity punctuated by apparent sites of recombination. In addition, we develop an analytical model for LD mapping based on such haplotype blocks. If our observed structure is general (and published data suggest that it may be), it offers a coherent framework for creating a haplotype map of the human genome.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Calibrating a coalescent simulation of human genome sequence variation.

            Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a "null" distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Assessing the impact of population stratification on genetic association studies.

              Population stratification refers to differences in allele frequencies between cases and controls due to systematic differences in ancestry rather than association of genes with disease. It has been proposed that false positive associations due to stratification can be controlled by genotyping a few dozen unlinked genetic markers. To assess stratification empirically, we analyzed data from 11 case-control and case-cohort association studies. We did not detect statistically significant evidence for stratification but did observe that assessments based on a few dozen markers lack power to rule out moderate levels of stratification that could cause false positive associations in studies designed to detect modest genetic risk factors. After increasing the number of markers and samples in a case-cohort study (the design most immune to stratification), we found that stratification was in fact present. Our results suggest that modest amounts of stratification can exist even in well designed studies.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                pgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                December 2006
                22 December 2006
                : 2
                : 12
                : e190
                Affiliations
                [1 ] Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
                [2 ] Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
                University of Alabama at Birmingham, United States of America
                Author notes
                * To whom correspondence should be addressed. E-mail: nickp@ 123456broad.mit.edu
                Article
                06-PLGE-RA-0101R3 plge-02-12-13
                10.1371/journal.pgen.0020190
                1713260
                17194218
                17a7d1b4-0e1d-4420-b478-284eb1f28841
                Copyright: © 2006 Patterson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 23 March 2006
                : 27 September 2006
                Page count
                Pages: 20
                Categories
                Research Article
                Genetics and Genomics
                Eukaryotes
                Custom metadata
                Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12): e190. doi: 10.1371/journal.pgen.0020190

                Genetics
                Genetics

                Comments

                Comment on this article