The influence of a priori grouping on inference of genetic clusters: simulation study and literature review of the DAPC method

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Inference of genetic clusters is a key aim of population genetics, sparking development of numerous analytical methods. Within these, there is a conceptual divide between finding de novo structure versus assessment of a priori groups. Recently developed, Discriminant Analysis of Principal Components (DAPC), combines discriminant analysis (DA) with principal component (PC) analysis. When applying DAPC, the groups used in the DA (specified a priori or described de novo) need to be carefully assessed. While DAPC has rapidly become a core technique, the sensitivity of the method to misspecification of groups and how it is being empirically applied, are unknown. To address this, we conducted a simulation study examining the influence of a priori versus de novo group designations, and a literature review of how DAPC is being applied. We found that with a priori groupings, distance between genetic clusters reflected underlying F _ST. However, when migration rates were high and groups were described de novo there was considerable inaccuracy, both in terms of the number of genetic clusters suggested and placement of individuals into those clusters. Nearly all (90.1%) of 224 studies surveyed used DAPC to find de novo clusters, and for the majority (62.5%) the stated goal matched the results. However, most studies (52.3%) omit key run parameters, preventing repeatability and transparency. Therefore, we present recommendations for standard reporting of parameters used in DAPC analyses. The influence of groupings in genetic clustering is not unique to DAPC, and researchers need to consider their goal and which methods will be most appropriate.

Related collections

Most cited references 50

Record: found
Abstract: found
Article: not found

Detecting the number of clusters of individuals using the software structure: a simulation study

G Evanno, S. Regnaut, J Goudet (2005)

The identification of genetically homogeneous groups of individuals is a long standing issue in population genetics. A recent Bayesian algorithm implemented in the software STRUCTURE allows the identification of such groups. However, the ability of this algorithm to detect the true number of clusters (K) in a sample of individuals when patterns of dispersal among populations are not homogeneous has not been tested. The goal of this study is to carry out such tests, using various dispersal scenarios from data generated with an individual-based model. We found that in most cases the estimated 'log probability of data' does not provide a correct estimation of the number of clusters, K. However, using an ad hoc statistic DeltaK based on the rate of change in the log probability of data between successive K values, we found that STRUCTURE accurately detects the uppermost hierarchical level of structure for the scenarios we tested. As might be expected, the results are sensitive to the type of genetic marker used (AFLP vs. microsatellite), the number of loci scored, the number of populations sampled, and the number of individuals typed in each sample.

0 comments Cited 2188 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Fast model-based estimation of ancestry in unrelated individuals.

David H. Alexander, John Novembre, Kenneth Lange (2009)

Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.

0 comments Cited 1916 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

adegenet: a R package for the multivariate analysis of genetic markers.

Thibaut Jombart (2008)

The package adegenet for the R software is dedicated to the multivariate analysis of genetic markers. It extends the ade4 package of multivariate methods by implementing formal classes and functions to manipulate and analyse genetic markers. Data can be imported from common population genetics software and exported to other software and R packages. adegenet also implements standard population genetics tools along with more original approaches for spatial genetics and hybridization. Stable version is available from CRAN: http://cran.r-project.org/mirrors.html. Development version is available from adegenet website: http://adegenet.r-forge.r-project.org/. Both versions can be installed directly from R. adegenet is distributed under the GNU General Public Licence (v.2).

0 comments Cited 1321 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Joshua M. Miller:

ORCID: http://orcid.org/0000-0002-4019-7675

jmm1@ualberta.ca

Journal

Journal ID (nlm-ta): Heredity (Edinb)

Journal ID (iso-abbrev): Heredity (Edinb)

Title: Heredity

Publisher: Springer International Publishing (Cham )

ISSN (Print): 0018-067X

ISSN (Electronic): 1365-2540

Publication date (Electronic): 4 August 2020

Publication date PMC-release: 4 August 2020

Publication date (Print): November 2020

Volume: 125

Issue: 5

Pages: 269-280

Affiliations

[1 ]GRID grid.17089.37, Department of Biological Sciences, , University of Alberta, ; Edmonton, AB Canada

[2 ]GRID grid.34428.39, ISNI 0000 0004 1936 893X, Department of Biology, , Carleton University, ; Ottawa, ON Canada

Author information

Joshua M. Miller http://orcid.org/0000-0002-4019-7675

Article

Publisher ID: 348

DOI: 10.1038/s41437-020-0348-2

PMC ID: 7553915

PubMed ID: 32753664

SO-VID: f1998c35-7ac9-494c-a102-ca8f67327171

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 8 December 2019

Date revision received : 19 July 2020

Date accepted : 20 July 2020

Custom metadata

ScienceOpen disciplines: Human biology

Keywords: genetic variation,population genetics

Data availability:

ScienceOpen disciplines: Human biology

Keywords: genetic variation, population genetics

Comments

Comment on this article

scite_

Cited by 32

See all cited by

Most referenced authors 765

See all reference authors

- Version 1

The influence of a priori grouping on inference of genetic clusters: simulation study and literature review of the DAPC method

Read this article at

Abstract

Related collections

Radiology and Natural Language Processing

Most cited references 50

Detecting the number of clusters of individuals using the software structure: a simulation study

Fast model-based estimation of ancestry in unrelated individuals.

adegenet: a R package for the multivariate analysis of genetic markers.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 179

Cited by 32

Most referenced authors 765