There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
A SNP in the gene encoding lactase (LCT) (C/T-13910) is associated with the ability
to digest milk as adults (lactase persistence) in Europeans, but the genetic basis
of lactase persistence in Africans was previously unknown. We conducted a genotype-phenotype
association study in 470 Tanzanians, Kenyans and Sudanese and identified three SNPs
(G/C-14010, T/G-13915 and C/G-13907) that are associated with lactase persistence
and that have derived alleles that significantly enhance transcription from the LCT
promoter in vitro. These SNPs originated on different haplotype backgrounds from the
European C/T-13910 SNP and from each other. Genotyping across a 3-Mb region demonstrated
haplotype homozygosity extending >2.0 Mb on chromosomes carrying C-14010, consistent
with a selective sweep over the past approximately 7,000 years. These data provide
a marked example of convergent evolution due to strong selective pressure resulting
from shared cultural traits-animal domestication and adult milk consumption.
We present a statistical model for patterns of genetic variation in samples of unrelated individuals from natural populations. This model is based on the idea that, over short regions, haplotypes in a population tend to cluster into groups of similar haplotypes. To capture the fact that, because of recombination, this clustering tends to be local in nature, our model allows cluster memberships to change continuously along the chromosome according to a hidden Markov model. This approach is flexible, allowing for both "block-like" patterns of linkage disequilibrium (LD) and gradual decline in LD with distance. The resulting model is also fast and, as a result, is practicable for large data sets (e.g., thousands of individuals typed at hundreds of thousands of markers). We illustrate the utility of the model by applying it to dense single-nucleotide-polymorphism genotype data for the tasks of imputing missing genotypes and estimating haplotypic phase. For imputing missing genotypes, methods based on this model are as accurate or more accurate than existing methods. For haplotype estimation, the point estimates are slightly less accurate than those from the best existing methods (e.g., for unrelated Centre d'Etude du Polymorphisme Humain individuals from the HapMap project, switch error was 0.055 for our method vs. 0.051 for PHASE) but require a small fraction of the computational cost. In addition, we demonstrate that the model accurately reflects uncertainty in its estimates, in that probabilities computed using the model are approximately well calibrated. The methods described in this article are implemented in a software package, fastPHASE, which is available from the Stephens Lab Web site.
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.