Although many algorithms exist for estimating haplotypes from genotype data, none
of them take full account of both the decay of linkage disequilibrium (LD) with distance
and the order and spacing of genotyped markers. Here, we describe an algorithm that
does take these factors into account, using a flexible model for the decay of LD with
distance that can handle both "blocklike" and "nonblocklike" patterns of LD. We compare
the accuracy of this approach with a range of other available algorithms in three
ways: for reconstruction of randomly paired, molecularly determined male X chromosome
haplotypes; for reconstruction of haplotypes obtained from trios in an autosomal region;
and for estimation of missing genotypes in 50 autosomal genes that have been completely
resequenced in 24 African Americans and 23 individuals of European descent. For the
autosomal data sets, our new approach clearly outperforms the best available methods,
whereas its accuracy in inferring the X chromosome haplotypes is only slightly superior.
For estimation of missing genotypes, our method performed slightly better when the
two subsamples were combined than when they were analyzed separately, which illustrates
its robustness to population stratification. Our method is implemented in the software
package PHASE (v2.1.1), available from the Stephens Lab Web site.