• Record: found
  • Abstract: found
  • Article: found
Is Open Access

Robust Demographic Inference from Genomic and SNP Data

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.

      Author Summary

      We present a new likelihood-based method to infer the past demography of a set of populations from large genomic datasets. Our method can be applied to arbitrarily complex models as the likelihood is estimated by coalescent simulations. Under simple scenarios, our method behaves similarly to a widely used diffusion-based method while showing better convergence properties. In addition, our approach can be applied to very complex models including as many as a dozen populations, and still retrieve parameters very accurately in a reasonable time. We apply our approach to estimate the past demography of four human populations for which non-coding whole genome diversity is available, estimating the degree of European admixture of a southwest African American population and that of a Kenyan population with an unsampled East African population. We also show the versatility of our framework by inferring the demographic history of African populations from SNP chip data with known ascertainment bias, and find a very old divergence time (>110 Ky) between Yorubas from Western Africa and Sans from Southern Africa.

      Related collections

      Most cited references 73

      • Record: found
      • Abstract: found
      • Article: not found

      A map of human genome variation from population-scale sequencing.

      The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
        • Record: found
        • Abstract: found
        • Article: not found

        Model selection in ecology and evolution.

        Recently, researchers in several areas of ecology and evolution have begun to change the way in which they analyze data and make biological inferences. Rather than the traditional null hypothesis testing approach, they have adopted an approach called model selection, in which several competing hypotheses are simultaneously confronted with data. Model selection can be used to identify a single best model, thus lending support to one particular hypothesis, or it can be used to make inferences based on weighted support from a complete set of competing models. Model selection is widely accepted and well developed in certain fields, most notably in molecular systematics and mark-recapture analysis. However, it is now gaining support in several other areas, from molecular evolution to landscape ecology. Here, we outline the steps of model selection and highlight several ways that it is now being implemented. By adopting this approach, researchers in ecology and evolution will find a valuable alternative to traditional null hypothesis testing, especially when more than one hypothesis is plausible.
          • Record: found
          • Abstract: found
          • Article: not found

          Evolution and functional impact of rare coding variation from deep sequencing of human exomes.

          As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ~313 genes per genome, and ~95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.

            Author and article information

            [1 ]CMPG, Institute of Ecology and Evolution, Berne, Switzerland
            [2 ]Swiss Institute of Bioinformatics, Lausanne, Switzerland
            [3 ]Center for Theoretical Evolutionary Genomics, Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
            [4 ]School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
            University of Washington, United States of America
            Author notes

            The authors have declared that no competing interests exist.

            Conceived and designed the experiments: LE MF. Performed the experiments: LE ID EHS VCS. Analyzed the data: LE ID EHS MF VCS. Contributed reagents/materials/analysis tools: LE ID EHS MF VCS. Wrote the paper: LE ID EHS MF VCS.

            Role: Editor
            PLoS Genet
            PLoS Genet
            PLoS Genetics
            Public Library of Science (San Francisco, USA )
            October 2013
            October 2013
            24 October 2013
            : 9
            : 10
            3812088 PGENETICS-D-13-00576 10.1371/journal.pgen.1003905

            This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

            Pages: 17
            This work was supported by Swiss NSF grants No 3100-126074, 31003A-143393, and CRSII3_141940 to LE. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
            Research Article



            Comment on this article