Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Identifying the ancestry of chromosomal segments of distinct ancestry has a wide range of applications from disease mapping to learning about history. Most methods require the use of unlinked markers; but, using all markers from genome-wide scanning arrays, it should in principle be possible to infer the ancestry of even very small segments with exquisite accuracy. We describe a method, HAPMIX, which employs an explicit population genetic model to perform such local ancestry inference based on fine-scale variation data. We show that HAPMIX outperforms other methods, and we explore its utility for inferring ancestry, learning about ancestral populations, and inferring dates of admixture. We validate the method empirically by applying it to populations that have experienced recent and ancient admixture: 935 African Americans from the United States and 29 Mozabites from North Africa. HAPMIX will be of particular utility for mapping disease genes in recently admixed populations, as its accurate estimates of local ancestry permit admixture and case-control association signals to be combined, enabling more powerful tests of association than with either signal alone.

Author Summary

The genomes of individuals from admixed populations consist of chromosomal segments of distinct ancestry. For example, the genomes of African American individuals contain segments of both African and European ancestry, so that a specific location in the genome may inherit 0, 1, or 2 copies of European ancestry. Inferring an individual's local ancestry, their number of copies of each ancestry at each location in the genome, has important applications in disease mapping and in understanding human history. Here we describe HAPMIX, a method that analyzes data from dense genotyping chips to infer local ancestry with very high precision. An important feature of HAPMIX is that it makes use of data from haplotypes (blocks of nearby markers), which are more informative for ancestry than individual markers. Our simulations demonstrate the utility of HAPMIX for local ancestry inference, and empirical applications to African American and Mozabite data sets uncover important aspects of the history of these populations.

Related collections

Most cited references 16

Record: found
Abstract: found
Article: not found

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.

Paul Scheet, Matthew Stephens (2006)

We present a statistical model for patterns of genetic variation in samples of unrelated individuals from natural populations. This model is based on the idea that, over short regions, haplotypes in a population tend to cluster into groups of similar haplotypes. To capture the fact that, because of recombination, this clustering tends to be local in nature, our model allows cluster memberships to change continuously along the chromosome according to a hidden Markov model. This approach is flexible, allowing for both "block-like" patterns of linkage disequilibrium (LD) and gradual decline in LD with distance. The resulting model is also fast and, as a result, is practicable for large data sets (e.g., thousands of individuals typed at hundreds of thousands of markers). We illustrate the utility of the model by applying it to dense single-nucleotide-polymorphism genotype data for the tasks of imputing missing genotypes and estimating haplotypic phase. For imputing missing genotypes, methods based on this model are as accurate or more accurate than existing methods. For haplotype estimation, the point estimates are slightly less accurate than those from the best existing methods (e.g., for unrelated Centre d'Etude du Polymorphisme Humain individuals from the HapMap project, switch error was 0.055 for our method vs. 0.051 for PHASE) but require a small fraction of the computational cost. In addition, we demonstrate that the model accurately reflects uncertainty in its estimates, in that probabilities computed using the model are approximately well calibrated. The methods described in this article are implemented in a software package, fastPHASE, which is available from the Stephens Lab Web site.

0 comments Cited 754 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Genotype, haplotype and copy-number variation in worldwide human populations.

Mattias Jakobsson, Sonja Scholz, Paul Scheet … (2008)

Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected--including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas--the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.

0 comments Cited 286 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A high-density admixture map for disease gene discovery in african americans.

P Jäger, Jean-Louis Sankalé, J Phair … (2004)

Admixture mapping (also known as "mapping by admixture linkage disequilibrium," or MALD) provides a way of localizing genes that cause disease, in admixed ethnic groups such as African Americans, with approximately 100 times fewer markers than are required for whole-genome haplotype scans. However, it has not been possible to perform powerful scans with admixture mapping because the method requires a dense map of validated markers known to have large frequency differences between Europeans and Africans. To create such a map, we screened through databases containing approximately 450000 single-nucleotide polymorphisms (SNPs) for which frequencies had been estimated in African and European population samples. We experimentally confirmed the frequencies of the most promising SNPs in a multiethnic panel of unrelated samples and identified 3011 as a MALD map (1.2 cM average spacing). We estimate that this map is approximately 70% informative in differentiating African versus European origins of chromosomal segments. This map provides a practical and powerful tool, which is freely available without restriction, for screening for disease genes in African American patient cohorts. The map is especially appropriate for those diseases that differ in incidence between the parental African and European populations.

0 comments Cited 107 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Genet

Journal ID (publisher-id): plos

Journal ID (pmc): plosgen

Title: PLoS Genetics

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-7390

ISSN (Electronic): 1553-7404

Publication date Collection: June 2009

Publication date (Print): June 2009

Publication date (Electronic): 19 June 2009

Volume: 5

Issue: 6

Electronic Location Identifier: e1000519

Affiliations

[1 ]Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America

[2 ]Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America

[3 ]Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America

[4 ]Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America

[5 ]Johns Hopkins Allergy and Asthma Center, Division of Clinical Immunology, Department of Medicine, School of Medicine, Baltimore, Maryland, United States of America

[6 ]Department of Biostatistics, Johns Hopkins School of Public Health, Baltimore, Maryland, United States of America

[7 ]Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, Maryland, United States of America

[8 ]Department of Statistics, Oxford University, Oxford, United Kingdom

[9 ]Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

University of Chicago, United States of America

Author notes

* E-mail: reich@ 123456genetics.med.harvard.edu (DR); myers@ 123456stats.ox.ac.uk (SM)

Conceived and designed the experiments: ALP NP DR SM. Performed the experiments: ALP AT NP KCB NR IR THB RM DR SM. Analyzed the data: ALP AT NP DR SM. Contributed reagents/materials/analysis tools: ALP NP KCB NR IR THB RM DR SM. Wrote the paper: ALP AT NP KCB NR IR THB RM DR SM.

Article

Publisher ID: 08-PLGE-RA-1438R2

DOI: 10.1371/journal.pgen.1000519

PMC ID: 2689842

PubMed ID: 19543370

SO-VID: 245bda82-9665-486f-be06-28c5e1aac9b1

Copyright © This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.

History

Date received : 27 October 2008

Date accepted : 15 May 2009

Page count

Pages: 18

Comments

Comment on this article

scite_

Cited by 197

See all cited by

Most referenced authors 1,290

See all reference authors

Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations

Read this article at

Abstract

Author Summary

Related collections

Genome Engineering using CRISPR

Most cited references 16

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.

Genotype, haplotype and copy-number variation in worldwide human populations.

A high-density admixture map for disease gene discovery in african americans.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 5

Cited by 197

Most referenced authors 1,290