49
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Enhancements to the ADMIXTURE algorithm for individual ancestry estimation

      product-review
      1 , , 1 , 2 , 3
      BMC Bioinformatics
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The estimation of individual ancestry from genetic data has become essential to applied population genetics and genetic epidemiology. Software programs for calculating ancestry estimates have become essential tools in the geneticist's analytic arsenal.

          Results

          Here we describe four enhancements to ADMIXTURE, a high-performance tool for estimating individual ancestries and population allele frequencies from SNP (single nucleotide polymorphism) data. First, ADMIXTURE can be used to estimate the number of underlying populations through cross-validation. Second, individuals of known ancestry can be exploited in supervised learning to yield more precise ancestry estimates. Third, by penalizing small admixture coefficients for each individual, one can encourage model parsimony, often yielding more interpretable results for small datasets or datasets with large numbers of ancestral populations. Finally, by exploiting multiple processors, large datasets can be analyzed even more rapidly.

          Conclusions

          The enhancements we have described make ADMIXTURE a more accurate, efficient, and versatile tool for ancestry estimation.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: found
          • Article: not found

          A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.

          We present a statistical model for patterns of genetic variation in samples of unrelated individuals from natural populations. This model is based on the idea that, over short regions, haplotypes in a population tend to cluster into groups of similar haplotypes. To capture the fact that, because of recombination, this clustering tends to be local in nature, our model allows cluster memberships to change continuously along the chromosome according to a hidden Markov model. This approach is flexible, allowing for both "block-like" patterns of linkage disequilibrium (LD) and gradual decline in LD with distance. The resulting model is also fast and, as a result, is practicable for large data sets (e.g., thousands of individuals typed at hundreds of thousands of markers). We illustrate the utility of the model by applying it to dense single-nucleotide-polymorphism genotype data for the tasks of imputing missing genotypes and estimating haplotypic phase. For imputing missing genotypes, methods based on this model are as accurate or more accurate than existing methods. For haplotype estimation, the point estimates are slightly less accurate than those from the best existing methods (e.g., for unrelated Centre d'Etude du Polymorphisme Humain individuals from the HapMap project, switch error was 0.055 for our method vs. 0.051 for PHASE) but require a small fraction of the computational cost. In addition, we demonstrate that the model accurately reflects uncertainty in its estimates, in that probabilities computed using the model are approximately well calibrated. The methods described in this article are implemented in a software package, fastPHASE, which is available from the Stephens Lab Web site.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Estimation of individual admixture: analytical and study design considerations.

            The genome of an admixed individual represents a mixture of alleles from different ancestries. In the United States, the two largest minority groups, African-Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case-control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods. Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency, measured by the mean squared error criterion, as Bayesian methods but requires just a fraction of the computational time to produce point estimates, allowing for extensive analysis (e.g., simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results. (c) 2005 Wiley-Liss, Inc.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity.

              A method is proposed for allowing for the effects of population differentiation, and other factors, in forensic inference based on DNA profiles. Much current forensic practice ignores, for example, the effects of coancestry and inappropriate databases and is consequently systematically biased against defendants. Problems with the 'product rule' for forensic identification have been highlighted by several authors, but important aspects of the problems are not widely appreciated. This arises in part because the match probability has often been confused with the relative frequency of the profile. Further, the analogous problems in paternity cases have received little attention. The proposed method is derived under general assumptions about the underlying population genetic processes. Probabilities relevant to forensic inference are expressed in terms of a single parameter whose values can be chosen to reflect the specific circumstances. The method is currently used in some UK courts and has important advantages over the 'Ceiling Principle' method, which has been criticized on a number of grounds.
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2011
                18 June 2011
                : 12
                : 246
                Affiliations
                [1 ]Department of Biomathematics, UCLA, Los Angeles, California, USA
                [2 ]Department of Human Genetics, UCLA Los Angeles, California, USA
                [3 ]Department of Statistics, UCLA Los Angeles, California, USA
                Article
                1471-2105-12-246
                10.1186/1471-2105-12-246
                3146885
                21682921
                5a8bf3d1-295e-40ad-81cb-a7036b95bc91
                Copyright ©2011 Alexander and Lange; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 2 February 2011
                : 18 June 2011
                Categories
                Software

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article