15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Characterizing genetic variation in humans is an important task in statistical genetics, enabling disease-gene mapping in genome-wide association studies (GWAS) and informing studies of human evolutionary history. A common approach to quantifying genetic variation...

          Abstract

          We introduce a simple and computationally efficient method for fitting the admixture model of genetic population structure, called ALStructure. The strategy of ALStructure is to first estimate the low-dimensional linear subspace of the population admixture components, and then search for a model within this subspace that is consistent with the admixture model’s natural probabilistic constraints. Central to this strategy is the observation that all models belonging to this constrained space of solutions are risk-minimizing and have equal likelihood, rendering any additional optimization unnecessary. The low-dimensional linear subspace is estimated through a recently introduced principal components analysis method that is appropriate for genotype data, thereby providing a solution that has both principal components and probabilistic admixture interpretations. Our approach differs fundamentally from other existing methods for estimating admixture, which aim to fit the admixture model directly by searching for parameters that maximize the likelihood function or the posterior probability. We observe that ALStructure typically outperforms existing methods both in accuracy and computational speed under a wide array of simulated and real human genotype datasets. Throughout this work, we emphasize that the admixture model is a special case of a much broader class of models for which algorithms similar to ALStructure may be successfully employed.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: not found

          Ancient human genomes suggest three ancestral populations for present-day Europeans

          We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE), who were most closely related to Upper Paleolithic Siberians and contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations' deep relationships and show that EEF had $\sim$44% ancestry from a "Basal Eurasian" lineage that split prior to the diversification of all other non-African lineages.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Estimation of individual admixture: analytical and study design considerations.

            The genome of an admixed individual represents a mixture of alleles from different ancestries. In the United States, the two largest minority groups, African-Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case-control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods. Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency, measured by the mean squared error criterion, as Bayesian methods but requires just a fraction of the computational time to produce point estimates, allowing for extensive analysis (e.g., simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results. (c) 2005 Wiley-Liss, Inc.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Algorithms and applications for approximate nonnegative matrix factorization

                Bookmark

                Author and article information

                Journal
                Genetics
                Genetics
                genetics
                genetics
                genetics
                Genetics
                Genetics Society of America
                0016-6731
                1943-2631
                August 2019
                26 April 2019
                26 April 2019
                : 212
                : 4
                : 1009-1029
                Affiliations
                [* ]Program in Applied and Computational Mathematics, Princeton University, New Jersey 08544
                []Lewis-Sigler Institute for Integrative Genomics, Princeton University, New Jersey 08544
                Author notes
                [1 ]Corresponding author: Princeton University, Princeton, NJ 08544. E-mail: jstorey@ 123456princeton.edu
                Author information
                http://orcid.org/0000-0003-3854-0486
                http://orcid.org/0000-0001-5992-402X
                Article
                302159
                10.1534/genetics.119.302159
                6707457
                31028112
                4c3bee34-9e9d-485f-ae12-b962484ee785
                Copyright © 2019 Cabreros and Storey

                Available freely online through the author-supported open access option.

                This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 07 February 2019
                : 08 April 2019
                Page count
                Figures: 12, Tables: 5, Equations: 53, References: 44, Pages: 21
                Categories
                Investigations
                Statistical Genetics and Genomics
                Custom metadata
                highlight-article

                Genetics
                admixture,genetic structure,logistic factor analysis,method of moments,nonparametric,pca,population stratification,population structure,unifying

                Comments

                Comment on this article