54
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH–Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

          Related collections

          Most cited references 7

          • Record: found
          • Abstract: found
          • Article: not found

          Estimation of individual admixture: analytical and study design considerations.

          The genome of an admixed individual represents a mixture of alleles from different ancestries. In the United States, the two largest minority groups, African-Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case-control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods. Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency, measured by the mean squared error criterion, as Bayesian methods but requires just a fraction of the computational time to produce point estimates, allowing for extensive analysis (e.g., simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results. (c) 2005 Wiley-Liss, Inc.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Interpreting principal component analyses of spatial population genetic variation.

            Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Detecting hybridization between wild species and their domesticated relatives.

               Ettore Randi (2008)
              The widespread occurrence of free-ranging domestic or feral carnivores (dogs, cats) or ungulates (pigs, goats), and massive releases of captive-reproduced game stocks (galliforms, waterfowl) is raising fear that introgressive hybridization with wild populations might disrupt local adaptations, leading to population decline and loss of biodiversity. Detecting introgression through hybridization is problematic if the parental populations cannot be sampled (unlike in classical stable hybrid zones), or if hybridization is sporadic. However, the use of hypervariable DNA markers (microsatellites) and new statistical methods (Bayesian models), have dramatically improved the assessment of cryptic population structure, admixture analyses and individual assignment testing. In this paper, I summarize results of projects aimed to identify occurrence and extent of introgressive hybridization in European populations of wolves (Canis lupus), wildcats (Felis silvestris), rock partridges and red-legged partridges (Alectoris graeca and Alectoris rufa), using genetic methods. Results indicate that introgressive hybridization can be locally pervasive, and that conservation plans should be implemented to preserve the integrity of the gene pools of wild populations. Population genetic methods can be fruitfully used to identify introgressed individuals and hybridizing populations, providing data which allow evaluating risks of outbreeding depression. The diffusion in the wild of invasive feral animals, and massive restocking with captive-reproduced game species, should be carefully controlled to avoid loss of genetic diversity and disruption of local adaptations.
                Bookmark

                Author and article information

                Journal
                Genetics
                Genetics
                genetics
                genetics
                genetics
                Genetics
                Genetics Society of America
                0016-6731
                1943-2631
                June 2014
                2 April 2014
                2 April 2014
                : 197
                : 2
                : 573-589
                Affiliations
                [* ]Department of Genetics, Stanford University, Stanford, California 94305
                []Departments of Statistics and Human Genetics, University of Chicago, Chicago, Illinois 60637
                []Department of Biology, Howard Hughes Medical Institute, Stanford University, Stanford, California 94305
                Author notes

                Available freely online through the author-supported open access option.

                [1 ]Corresponding author: Stanford University, 300 Pasteur Dr., Alway Bldg., M337, Stanford, CA 94305. E-mail: rajanil@ 123456stanford.edu
                Article
                164350
                10.1534/genetics.114.164350
                4063916
                24700103
                894ec9a1-66bb-4a2e-8bda-2b9809da0a31
                Copyright © 2014 by the Genetics Society of America

                Available freely online through the author-supported open access option.

                Page count
                Pages: 17
                Product
                Categories
                Investigations
                Methods, Technology, and Resources
                Custom metadata
                v1
                highlight-article

                Genetics

                variational inference, population structure

                Comments

                Comment on this article