      Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity

          Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes based on a coalescent approach combined with advanced algorithms for indexing sequences. These algorithms are particularly efficient when analyzing large genomes, as their run time ideally grows only linearly with sequence length. Using this approach, we have derived and implemented a maximum-likelihood estimator of the average number of mismatches per site between two closely related sequences, π. By allowing for fluctuating coalescent times, we are able to improve a previously published alignment-free estimator of π. We show through simulation that our new estimator is fast and accurate even with moderate recombination ( ρπ). To demonstrate its applicability to real data, we compare the unaligned genomes of Drosophila persimilis and D. pseudoobscura. In agreement with previous studies, our sliding window analysis locates the global divergence minimum between these two genomes to the pericentromeric region of chromosome 3.

                Author and article information

                G3 (Bethesda)
                G3: Genes|Genomes|Genetics
                Genetics Society of America
                1 August 2012
                August 2012
                : 2
                : 8
                : 883-889
                [* ]Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
                []Mathematical Stochastics, Mathematical Institute, Albert-Ludwigs University, 79085 Freiburg, Germany
                Author notes
                [1 ]Corresponding author: Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany. E-mail: haubold@ 123456evolbio.mpg.de
                Copyright © 2012 Haubold, Pfaffelhuber

                This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                genetic diversity, maximum-likelihood, alignment-free, match length distribution, drosophila


