fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets

Raj, Anil; Stephens, Matthew; Pritchard, Jonathan K.

doi:10.1534/genetics.114.164350

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets

research-article

Author(s): Anil Raj ^* ^, ¹ , Matthew Stephens ^† , Jonathan K. Pritchard ^* ^, ^‡

Publication date (Electronic): 2 April 2014

Journal: Genetics

Publisher: Genetics Society of America

Keywords: variational inference, population structure

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH–Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

Related collections

Most cited references 7

Record: found
Abstract: found
Article: not found

Estimation of individual admixture: analytical and study design considerations.

Hua Tang, Jie Peng, Pei Wang … (2005)

The genome of an admixed individual represents a mixture of alleles from different ancestries. In the United States, the two largest minority groups, African-Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case-control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods. Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency, measured by the mean squared error criterion, as Bayesian methods but requires just a fraction of the computational time to produce point estimates, allowing for extensive analysis (e.g., simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results. (c) 2005 Wiley-Liss, Inc.

0 comments Cited 235 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Interpreting principal component analyses of spatial population genetic variation.

John Novembre, Matthew Stephens (2008)

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

0 comments Cited 213 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Detecting hybridization between wild species and their domesticated relatives.

Ettore Randi (2008)

The widespread occurrence of free-ranging domestic or feral carnivores (dogs, cats) or ungulates (pigs, goats), and massive releases of captive-reproduced game stocks (galliforms, waterfowl) is raising fear that introgressive hybridization with wild populations might disrupt local adaptations, leading to population decline and loss of biodiversity. Detecting introgression through hybridization is problematic if the parental populations cannot be sampled (unlike in classical stable hybrid zones), or if hybridization is sporadic. However, the use of hypervariable DNA markers (microsatellites) and new statistical methods (Bayesian models), have dramatically improved the assessment of cryptic population structure, admixture analyses and individual assignment testing. In this paper, I summarize results of projects aimed to identify occurrence and extent of introgressive hybridization in European populations of wolves (Canis lupus), wildcats (Felis silvestris), rock partridges and red-legged partridges (Alectoris graeca and Alectoris rufa), using genetic methods. Results indicate that introgressive hybridization can be locally pervasive, and that conservation plans should be implemented to preserve the integrity of the gene pools of wild populations. Population genetic methods can be fruitfully used to identify introgressed individuals and hybridizing populations, providing data which allow evaluating risks of outbreeding depression. The diffusion in the wild of invasive feral animals, and massive restocking with captive-reproduced game species, should be carefully controlled to avoid loss of genetic diversity and disruption of local adaptations.

0 comments Cited 146 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Genetics

Journal ID (iso-abbrev): Genetics

Journal ID (hwp): genetics

Journal ID (pmc): genetics

Journal ID (publisher-id): genetics

Title: Genetics

Publisher: Genetics Society of America

ISSN (Print): 0016-6731

ISSN (Electronic): 1943-2631

Publication date (Print): June 2014

Publication date (Electronic): 2 April 2014

Publication date PMC-release: 2 April 2014

Volume: 197

Issue: 2

Pages: 573-589

Affiliations

[* ]Department of Genetics, Stanford University, Stanford, California 94305

[† ]Departments of Statistics and Human Genetics, University of Chicago, Chicago, Illinois 60637

[‡ ]Department of Biology, Howard Hughes Medical Institute, Stanford University, Stanford, California 94305

Author notes

Available freely online through the author-supported open access option.

Supporting information is available online at http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.114.164350/-/DC1.

[1 ]Corresponding author: Stanford University, 300 Pasteur Dr., Alway Bldg., M337, Stanford, CA 94305. E-mail: rajanil@ 123456stanford.edu

Article

Publisher ID: 164350

DOI: 10.1534/genetics.114.164350

PMC ID: 4063916

PubMed ID: 24700103

SO-VID: 894ec9a1-66bb-4a2e-8bda-2b9809da0a31

License:

Available freely online through the author-supported open access option.

History

Date received : 02 December 2013

Date accepted : 25 March 2014

Page count

Pages: 17

Custom metadata

DJS Export v1

special-property highlight-article

ScienceOpen disciplines: Genetics

Keywords: variational inference,population structure

Data availability:

ScienceOpen disciplines: Genetics

Keywords: variational inference, population structure

Comments

Comment on this article

scite_

Cited by 645

See all cited by

Most referenced authors 447

See all reference authors

- Version 1

fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 7

Estimation of individual admixture: analytical and study design considerations.

Interpreting principal component analyses of spatial population genetic variation.

Detecting hybridization between wild species and their domesticated relatives.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 39

Cited by 645

Most referenced authors 447