      Reconstructing Native American Migrations from Whole-Genome and Whole-Exome Data


          There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is in MXL, in CLM, and in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern America ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas thousand years ago (kya), supports that the MXL Ancestors split kya, with a subsequent split of the ancestors to CLM and PUR kya. The model also features effective populations of in Mexico, in Colombia, and in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.

          Populations of the Americas have a rich and heterogeneous genetic and cultural heritage that draws from a diversity of pre-Columbian Native American, European, and African populations. Characterizing this diversity facilitates the development of medical genetics research in diverse populations and the transfer of medical knowledge across populations. It also represents an opportunity to better understand the peopling of the Americas, from the crossing of Beringia to the post-Columbian era. Here, we take advantage sequencing of individuals of Colombian (CLM), Mexican (MXL), and Puerto Rican (PUR) origin by the 1000 Genomes project to improve our demographic models for the peopling of the Americas. The divergence among African, European, and Native American ancestors to these populations enables us to infer the continent of origin at each locus in the sampled genomes. The resulting patterns of ancestry suggest complex post-Columbian migration histories, starting later in CLM than in MXL and PUR. Whereas European ancestral segments show evidence of relatedness, a demographic model of synonymous variation suggests that the Native American Ancestors to MXL, PUR, and CLM panels split within a few hundred years over 12 thousand years ago. Together with early archeological sites in South America, these results support rapid divergence during the initial peopling of the Americas.

          Fast model-based estimation of ancestry in unrelated individuals.

          Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.
            An integrated map of genetic variation from 1,092 human genomes

            Summary Through characterising the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help understand the genetic contribution to disease. We describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methodologies to integrate information across multiple algorithms and diverse data sources we provide a validated haplotype map of 38 million SNPs, 1.4 million indels and over 14 thousand larger deletions. We show that individuals from different populations carry different profiles of rare and common variants and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways and that each individual harbours hundreds of rare non-coding variants at conserved sites, such as transcription-factor-motif disrupting changes. This resource, which captures up to 98% of accessible SNPs at a frequency of 1% in populations of medical genetics focus, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
              The late Pleistocene dispersal of modern humans in the Americas.

              When did humans colonize the Americas? From where did they come and what routes did they take? These questions have gripped scientists for decades, but until recently answers have proven difficult to find. Current genetic evidence implies dispersal from a single Siberian population toward the Bering Land Bridge no earlier than about 30,000 years ago (and possibly after 22,000 years ago), then migration from Beringia to the Americas sometime after 16,500 years ago. The archaeological records of Siberia and Beringia generally support these findings, as do archaeological sites in North and South America dating to as early as 15,000 years ago. If this is the time of colonization, geological data from western Canada suggest that humans dispersed along the recently deglaciated Pacific coastline.

                PLoS Genet
                PLoS Genet
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                December 2013
                December 2013
                26 December 2013
                : 9
                : 12
                [1 ]Department of Human Genetics, McGill University, Montréal, Québec, Canada
                [2 ]McGill University and Génome Québec Innovation Centre, Montréal, Québec, Canada
                [3 ]Department of Genetics, Stanford University, Stanford, California, United States of America
                [4 ]Ancestry.com DNA LLC, San Francisco, California, United States of America
                [5 ]Laboratorio de Genética Molecular Poblacional, Instituto Multidisciplinario de Biología Celular (IMBICE). CCT- CONICET-La Plata, Argentina and Facultad de Ciencias Naturales y Museo, Universidad Nacional de La Plata, La Plata, Argentina
                [6 ]Weill Cornell Medical College, New York, New York, United States of America
                [7 ]Department of Genetics and Genomic Sciences, The Charles Bronfman Institute for Personalized Medicine, Center for Statistical Genetics, and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
                [8 ]Department of Bioengineering and Therapeutic Sciences and Medicine, Univeristy of California San Francisco, San Francisco, California, United States of America
                [9 ]Department of Biology, University of Puerto Rico at Mayaguez, Mayaguez, Puerto Rico
                [10 ]Department of Biochemistry, Ponce School of Medicine and Health Sciences, Ponce, Puerto Rico
                [11 ]Department of Psychiatry and Clinical Psychobiology, University of Barcelona, Barcelona, Spain
                [12 ]Universidad de Antioquia, Medellín, Colombia
                [13 ]Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
                Dartmouth College, United States of America
                ¶ Membership of The 1000 Genomes Project can be found in the acknowledgments or at http://www.1000genomes.org/participants.

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: SG MV KS TKO ARL EGB JCMC CDB. Analyzed the data: SG FZ JKB MM AME JLRF EEK CRG WG. Contributed reagents/materials/analysis tools: SG FZ JKB AME CRG BKM JD GB TKO ARL EGB. Wrote the paper: SG MM AME CDB.


                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                Pages: 13
                This study was supported by NSF grant 7188155, NHGRI grant HG005715, and NIH R01 GM090087 (to CDB), UCSF Chancellors Research Fellowship, Dissertation Year Fellowship, and in part by NIH Training Grant T32 GM007175 (to CRG); 1P60 MD006902, R01 HL088133, R01 ES015794, RWJF Amos Medical Faculty Development Award, the Sandler Foundation; the American Asthma Foundation (to EGB), and the Pew Latin American Fellows Program (MM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
