12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Evidence of Early-Stage Selection on EPAS1 and GPR126 Genes in Andean High Altitude Populations

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The aim of this study is to identify genetic variants that harbour signatures of recent positive selection and may facilitate physiological adaptations to hypobaric hypoxia. To achieve this, we conducted whole genome sequencing and lung function tests in 19 Argentinean highlanders (>3500 m) comparing them to 16 Native American lowlanders. We developed a new statistical procedure using a combination of population branch statistics (PBS) and number of segregating sites by length (nSL) to detect beneficial alleles that arose since the settlement of the Andes and are currently present in 15–50% of the population. We identified two missense variants as significant targets of selection. One of these variants, located within the GPR126 gene, has been previously associated with the forced expiratory volume/forced vital capacity ratio. The other novel missense variant mapped to the EPAS1 gene encoding the hypoxia inducible factor 2α. EPAS1 is known to be the major selection candidate gene in Tibetans. The derived allele of GPR126 is associated with lung function in our sample of highlanders (p < 0.05). These variants may contribute to the physiological adaptations to hypobaric hypoxia, possibly by altering lung function. The new statistical approach might be a useful tool to detect selected variants in population studies.

          Related collections

          Most cited references24

          • Record: found
          • Abstract: found
          • Article: not found

          Inferring human population size and separation history from multiple genome sequences

          The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model their ancestral relationship under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20-30 thousand years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The Multiple Sequentially Markovian Coalescent (MSMC) analyses the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago, and give information about human population history as recently as 2,000 years ago, including the bottleneck in the peopling of the Americas, and separations within Africa, East Asia and Europe.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            On Detecting Incomplete Soft or Hard Selective Sweeps Using Haplotype Structure

            We present a new haplotype-based statistic (nS L) for detecting both soft and hard sweeps in population genomic data from a single population. We compare our new method with classic single-population haplotype and site frequency spectrum (SFS)-based methods and show that it is more robust, particularly to recombination rate variation. However, all statistics show some sensitivity to the assumptions of the demographic model. Additionally, we show that nS L has at least as much power as other methods under a number of different selection scenarios, most notably in the cases of sweeps from standing variation and incomplete sweeps. This conclusion holds up under a variety of demographic models. In many aspects, our new method is similar to the iHS statistic; however, it is generally more robust and does not require a genetic map. To illustrate the utility of our new method, we apply it to HapMap3 data and show that in the Yoruban population, there is strong evidence of selection on genes relating to lipid metabolism. This observation could be related to the known differences in cholesterol levels, and lipid metabolism more generally, between African Americans and other populations. We propose that the underlying causes for the selection on these genes are pleiotropic effects relating to blood parasites rather than their role in lipid metabolism.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found

              Genomic analyses inform on migration events during the peopling of Eurasia

              High-coverage whole-genome sequence studies have so far focused on a limited number1 of geographically restricted populations 2–5, or targeted at specific diseases, e.g. cancer6. Nevertheless, the availability of high-resolution genomic data has led to the development of new methodologies for inferring population history7–9 and refuelled the debate on the mutation rate in humans10. Here we present the Estonian Biocentre human Genome Diversity Panel (EGDP), a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations, which we group into Diversity and Selection Sets (ED1-2; SI1:1.1-7). We analyse this dataset to refine estimates of continent-wide patterns of heterozygosity, long- and short-distance gene flow, archaic admixture, and changes in effective population size through time. We find a genetic signature in present-day Papuans suggesting that at least 2% of their genome originates from an early and largely extinct expansion of anatomically modern humans (AMH) Out-of-Africa (xOoA). Together with evidence from the Western Asian fossil record11, and admixture between AMHs and Neanderthals predating the main Eurasian expansion12, our results contribute to the mounting evidence for the presence of AMH out of Africa earlier than 75kya. We also screen for signals of positive or balancing selection. The paths taken by AMHs out of Africa (OoA) have been the subject of considerable debate over the past two decades. Fossil and archaeological evidence13,14, and craniometric studies15 of African and Asian populations, demonstrate that Homo sapiens was present outside of Africa ca. 120-70 kya11. However, this colonization has been viewed as a failed expansion OoA16 since genetic analyses of living populations have been consistent with a single OoA followed by serial founder events17. Ancient DNA (aDNA) sequencing studies have found support for admixture between early Eurasians and at least two archaic human lineages18,19, and suggests modern human reached Eurasia at around 100kya12. In addition, aDNA from modern humans suggests population structuring and turnover, but little additional archaic admixture, in Eurasia over the last 35-45 thousand years20–22. Overall, these findings indicate that the majority of human genetic diversity outside Africa derives from a single dispersal event that was followed by admixture with archaic humans18,23. We used ADMIXTURE to analyse the genetic structure in our Diversity Set (ED1). We further compared the individual-level haplotype similarity of our samples using fineSTRUCTURE (ED3). Despite small sample sizes, we inferred 106 genetically distinct populations forming 12 major regional clusters, corresponding well to the 148 self-identified population labels. This clustering forms the basis for the groupings used in the scans of natural selection. Similar genetic affinities are highlighted by plotting the outgroup f 3 statistic9 in the form f 3(X, Y; Yoruba), which here measures shared drift between a non-African population X and any modern or ancient population Y from Yoruba as an African outgroup (SI1:2.2.6, ED4). Our sampling allowed us to consider geographic features correlated with gene flow by spatially interpolating genetic similarity measures between pairs of populations (SI1:2.2.2). We considered several measures and report gradients of allele frequencies in Figure 1, which was compared to gene flow patterns from EEMS24 as a validation (ED5). Controlling for pairwise geographic distance, we find a correlation between these genetic gradients and geographic and climatic features such as precipitation and elevation (inset of Figure 1, SI1:2.2.2). We screened for evidence of selection by first focusing on loci that showed the highest allelic differentiation among groups (SI1:3). We then performed positive and purifying selection scans (Methods), and find some candidate loci that replicate previously known and functionally-supported findings (SI2:3.3.4-I, SI1:3.1, ED6; SI2:3.1-IV,VI). Additionally, we infer more purifying selection in Africans in genes involved in pigmentation (bootstrapping p value - bpv for RX/Y-scores 500Kb) run of homozygosity. We ran ChromoPainter for each individual on only these regions, meaning each individual was only painted where it had been perfectly phased. This did not change the qualitative features (SI1:2.2.1). Removal of similar samples Papuans are genetically distinct from other populations due to tens of thousands of years of isolation. We wanted to check whether the length of haplotypes assigned as African were biased by the inclusion of a large number of relatively homogeneous Eurasians with few Papuans. To do this we repeated the N=447 painting allowing only donors from dissimilar populations, including only individuals who donated 40 kya, their mtDNA and Y lineages could have been lost by genetic drift even assuming an initial xOoA mixing component of up to 35%. Similar findings have been reported recently13. Extended Data ED1 Sample Diversity and Archaic signals. A: Map of location of samples highlighting the Diversity/Selection Sets; B: ADMIXTURE plot (K=8 and 14) which relates general visual inspection of genetic structure to studied populations and their region of origin; C: Sample level heterozygosity is plotted against distance from Addis Ababa. The trend line represents only non-African samples. The inset shows the waypoints used to arrive at the distance in kilometres for each sample. D: Boxplots were used to visualize the Denisova (red), Altai (green) and Croatian Neanderthal (blue) D distribution for each regional group of samples. Oceanian Altai D values show a remarkable similarity with the Denisova D values for the same region, in contrast with the other groups of samples where the Altai boxplots tend to be more similar to the Croatian Neanderthal ones. ED2 Data quality checks and heterozygosity patterns. Concordance of DNA sequencing (Complete Genomics Inc.) and DNA genotyping (Illumina genotyping arrays) data (ref-ref; het-ref-alt and hom-alt-alt, see SI 1.6) from chip (A) and sequence data (B). Coverage (depth) distribution of variable positions, divided by DNA source (Blood or Saliva) and Complete Genomic calling pipeline (release version) (C). Genome-wide distribution of Transition/Transversion ratio subdivided by DNA source (Saliva or Blood) and by Complete Genomic calling pipeline (D). Genome-wide distribution of Transition/Transversion ratio subdivided by chromosomes (E). Inter-chromosome differences in observed heterozygosity in 447 samples from the Diversity Set (F). Inter-chromosome differences in observed heterozygosity in a set of 50 unpublished genomes from the Estonian Genome Center, sequenced on an Illumina platform at an average coverage exceeding 30x (G). Inter-chromosome differences in observed heterozygosity in the phase 3 of the 1000 Genomes Project (H). The total number of observed heterozygous sites was divided by the number of accessible basepairs reported by the 1000 Genomes Project. ED3 FineSTRUCTURE shared ancestry analysis. ChromoPainter and FineSTRUCTURE results, showing both inferred populations with the underlying (averaged) number of haplotypes that an individual in a population receives (rows) from donor individuals in other populations (columns). 108 populations are inferred by FineSTRUCTURE. The dendrogram shows the inferred relationship between populations. The numbers on the dendrogram give the proportion of MCMC iterations for which each population split is observed (where this is less than 1). Each “geographical region” has a unique colour from which individuals are labeled. The number of individuals in each population is given in the label; e.g. “4Italians; 3Albanians” is a population of size 7 containing 4 individuals from Italy and 3 from Albania. ED4 MSMC genetic split times and outgroup f3 results. The MSMC split times estimated between each sample and a reference panel of 9 genomes were linearly interpolated to infer the broader square matrix (A). Summary of outgroup f3 statistics for each pair of non-African populations (B) or to an ancient sample (C) using Yoruba as an outgroup. Populations are grouped by geographic region and are ordered with increasing distance from Africa (left to right for columns and bottom to top for rows). Colour bars at the left and top of the heat map indicate the colour coding used for the geographical region. Individual population labels are indicated at the right and bottom of the heat map. The f3 statistics are scaled to lie between 0 and 1, with a black colour indicating those close to 0 and a red colour indicating those close to 1. Let m and M be the minimum and maximum f3 values within a given row (i.e., focal population). That is, for focal population X (on rows), m = minY,Y≠X f3(X, Y ; Yoruba) and M = maxY,Y≠X f3(X, Y ; Yoruba). The scaled f3 statistic for a given cell in that row is given by f3scaled=(f3-m)/(M-m), so that the smallest f3 in the row has value f3scaled=0 (black) and the largest has value f3scaled=1 (red). By default, the diagonal has value f3scaled=1 (red). The heat map is therefore asymmetric, with the population closest to the focal population at a given row having value f3scaled=1 (red colour) and the population farthest from the focal population at a given row having value f3scaled=0 (black colour). Therefore, at a given row, scanning the columns of the heat map reveals the populations with the most shared ancestry with the focal population of that row in the heat map. ED5 Geographical patterns of genetic diversity. Isolation by distance pattern across areas of high genetic gradient, using Europe as a baseline. The samples used in each analysis are indicated by coloured lines on the maps to the right of each plot. The panels show F ST as a function of distance across the Himalayas (A), the Ural mountains (B), and the Caucasus (C) as reported on the color-coded map (D). Effect of creating gaps in the samples in Europe (E): we tested the effect of removing samples from stripes, either north to south (F) or west to east (G), to create gaps comparable in size to the gaps in samples in the dataset. Effective migration surfaces inferred by EEMS (H). ED6 Summary of positive selection results Barplot comparing frequency distributions of functional variants in Africans and non-Africans (A). The distribution of exonic SNPs according to their functional impact (synonymous, missense and nonsense) as a function of allele frequency. Note that the data from both groups was normalised for a sample size of n=21 and that the Africans show significantly (Chisq p-value 500kb) Run of Homozygosity using the PLINK command “--homozyg-window-kb 500000 --homozyg-window-het 0 --homozyg-density 10”. Because there are so few such regions, we report only the population average for populations with two or more individuals, as well as the standard error in that estimate. Populations for whom the 95% CI passed 0 were also excluded. Note the logarithmic axis. D: Ancient DNA panel results. We used a different panel of 109 individuals which included 3 ancient genomes. We painted Chromosomes 11, 21 & 22 and report as crosses the population averages for populations with 2 or more individuals. The solid thin lines represent the position of each population when modern samples only are analysed. The dashed lines lead off the figure to the position of the ancient hominins and the African samples. ED8 MSMC Linear behavior of MSMC split estimates in presence of admixture. The examined Central Asian (A), East African (B), and African-American (C) genomes yielded a signature of MSMC split time (Truth, left-most column) that could be recapitulated (Reconstruction, second left most column) as a linear mixture of other MSMC split times. The admixture proportions inferred by our method (top of each admixture component column) were remarkably similar to the ones previously reported from the literature. MSMC split times (D) calculated after re-phasing an Estonian and a Papuan (Koinanbe) genome together with all the available West African and Pygmy genomes from our dataset to minimize putative phasing artefacts. The cross coalescence rate curves reported here are quantitatively comparable with the ones of Figure 2 A, hence showing that phasing artefacts are unlikely to explain the observed past-ward shift of the Papuan-African split time. Boxplot (E) showing the distribution of differences between African-Papuan and African-Eurasian split times obtained from coalescent simulations assembled through random replacement to make 2000 sets of 6 individuals (to match the 6 Papuans available from our empirical dataset), each made of 1.5 Gb of sequence. The simulation command line used to generate each chromosome made of 5Mb was as follows, being *DIV*=0.064; 0.4 or 0.8 for the xOoA, Denisova (Den) and Divergent Denisova (DeepDen) cases, respectively: ms0ancient2 10 1 .065 .05 -t 5000. -r 3000. 5000000 -I 7 1 1 1 1 2 2 2 -en 0. 1 .2 -en 0. 2 .2 -en 0. 3 .2 -en 0. 4 .2 -es .025 7 .96 -en .025 8 .2 -ej .03 7 6 -ej .04 6 5 -ej .060 8 3 -ej .061 4 3 -ej .062 2 1 -ej .063 3 1 -ej *DIV* 1 5 ED9 Modelling the xOoA components with FineSTRUCTURE. A: Joint distribution of haplotype lengths and Derived allele count, showing the median position of each cluster and all haplotypes assigned to it in the Maximum A Posteriori (MAP) estimate. Note that although a different proportion of points is assigned to each in the MAP, the total posterior is very close to 1/K for all. The dashed lines show a constant mutation rate. Haplotypes are ordered by mutation rate from low to high. B: Residual distribution comparison between the two component mixture using EUR.AFR and EUR.PNG (left), and the three component mixture including xOoA (using the same colour scale) (right). The residuals without xOoA are larger (RMSE 0.0055 compared to RMSE 0.0018) but more importantly, they are also structured. C: Assuming a mutational clock and a correct assignment of haplotypes, we can estimate the relative age of the splits from the number of derived alleles observed on the haplotypes. This leads to an estimate of 1.5 times older for xOoA compared to the Eurasian-Africa split. ED10 Proposed xOoA model. A subway map figure illustrating, as suggested by the novel results presented here, a model of an early, extinct Out-of-Africa (xOoA) signature in the genomes of Sahul populations at their arrival in the region. Given the overall small genomic contribution of this event to the genomes of modern Sahul individuals, we could not determine whether the documented Denisova admixture (question marks) and putative multiple Neanderthal admixtures took place along this extinct OoA. We also speculate (question mark) people who migrated along the xOoA route may have left a trace in the genomes of the Altai Neanderthal as reported by Kuhlwilm and colleagues12. Supplementary Information Additional results are reported in two Supplementary Information files online: SI1 including description of additional analyses, and SI2 including results in table format. Supplementary Information SI1 Table SI2
                Bookmark

                Author and article information

                Contributors
                Christina.eichstaedt@med.uni-heidelberg.de
                guysjacobs@ntu.edu.sg
                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group UK (London )
                2045-2322
                12 October 2017
                12 October 2017
                2017
                : 7
                : 13042
                Affiliations
                [1 ]ISNI 0000 0001 0328 4908, GRID grid.5253.1, Thoraxclinic at the University Hospital Heidelberg, ; Heidelberg, Baden-Württemberg Germany
                [2 ]ISNI 0000 0001 2190 4373, GRID grid.7700.0, Institute of Human Genetics, Heidelberg University, ; Heidelberg, Baden-Württemberg Germany
                [3 ]ISNI 0000000121885934, GRID grid.5335.0, Department of Archaeology and Anthropology, University of Cambridge, ; Cambridge, Cambridgeshire UK
                [4 ]ISNI 0000000404106064, GRID grid.82937.37, Estonian Biocentre, ; Tartu, Tartumaa Estonia
                [5 ]ISNI 0000 0001 2192 5772, GRID grid.253613.0, Division of Biological Sciences, University of Montana, ; Missoula, Missoula County, Montana USA
                [6 ]ISNI 0000000121885934, GRID grid.5335.0, MRC Epidemiology Unit, University of Cambridge, ; Cambridge, Cambridgeshire UK
                [7 ]ISNI 0000 0001 2097 0141, GRID grid.121334.6, Institute for Computational Biology, University of Montpellier, ; Montferrier-sur-Lez, Hérault France
                [8 ]ISNI 0000 0004 1936 9297, GRID grid.5491.9, Mathematical Sciences, University of Southampton, ; Southampton, Hampshire UK
                [9 ]ISNI 0000 0001 0943 7661, GRID grid.10939.32, Department of Evolutionary Biology, , Institute of Molecular and Cell Biology, University of Tartu, ; Tartu, Tartumaa Estonia
                [10 ]ISNI 0000 0001 0943 7661, GRID grid.10939.32, Estonian Genome Centre, University of Tartu, ; Tartu, Tartumaa Estonia
                [11 ]ISNI 0000 0001 0943 7661, GRID grid.10939.32, Department of Biotechnology, , Institute of Molecular and Cell Biology, University of Tartu, ; Tartu, Tartumaa Estonia
                [12 ]GRID grid.148374.d, Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, ; Palmerston North, Kairanga New Zealand
                [13 ]ISNI 0000 0000 9422 2878, GRID grid.267454.6, Department of Applied Sciences, , Faculty of Humanities and Social Sciences, University of Winchester, ; Winchester, Hampshire UK
                [14 ]ISNI 0000 0001 2224 0361, GRID grid.59025.3b, Complexity Institute, Nanyang Technological University, ; Singapore, Singapore
                Author information
                http://orcid.org/0000-0001-7288-8297
                http://orcid.org/0000-0002-9163-0061
                Article
                13382
                10.1038/s41598-017-13382-4
                5638799
                29026132
                3ffa5a77-ed58-4f48-945b-2351fc14debc
                © The Author(s) 2017

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 15 March 2017
                : 22 September 2017
                Categories
                Article
                Custom metadata
                © The Author(s) 2017

                Uncategorized
                Uncategorized

                Comments

                Comment on this article