10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Paleogenomics: reconstruction of plant evolutionary trajectories from modern and ancient DNA

      brief-report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          How contemporary plant genomes originated and evolved is a fascinating question. One approach uses reference genomes from extant species to reconstruct the sequence and structure of their common ancestors over deep timescales. A second approach focuses on the direct identification of genomic changes at a shorter timescale by sequencing ancient DNA preserved in subfossil remains. Merged within the nascent field of paleogenomics, these complementary approaches provide insights into the evolutionary forces that shaped the organization and regulation of modern genomes and open novel perspectives in fostering genetic gain in breeding programs and establishing tools to predict future population changes in response to anthropogenic pressure and global warming.

          Related collections

          Most cited references129

          • Record: found
          • Abstract: found
          • Article: not found

          A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees

          Introduction Genes, or regions of the genome, that have been affected by natural selection may show an excess of functionally important molecular changes, beyond what would be expected in the absence of selection. Genomic regions with such an excess of changes are said to have experienced positive selection, i.e., selection in favor of new genetic variants. The most common statistical technique for detecting positive selection takes advantage of the fact that mutations in coding regions of genes come in two classes: nonsynonymous mutations that change the resulting amino acid sequence of the protein and synonymous mutations, which do not change the encoded protein. An excess of nonsynonymous mutations over synonymous mutations, beyond what would be expected if the two types of mutations occur at the same rate, provides strong evidence for the past action of positive selection at the protein level. Using this logic, there have recently been numerous studies documenting positive selection in a variety of genes and organisms, including immune-response-related genes [1–3], viral genes [4–6], fertilization genes [7,8], and genes involved in sensory perception and olfaction in humans [9]. Clark et al. [10] compared 7,645 genes from humans to their orthologs from the chimpanzee and the mouse. For each gene, they tested if there was an excess of nonsynonymous substitutions on the evolutionary lineage leading to humans. They showed that there was an excess of putatively positively selected genes in several functional classes, including genes involved in sensory perception, olfaction, and amino acid catabolism. They also showed that human genes that have been targeted by positive selection are significantly more likely to harbor variation associated with known genetic diseases. We here report the results of an analysis of 20,361 human and chimpanzee genes (of which 6,630 later were eliminated in a very conservative quality control), which includes the 7,645 genes analyzed by Clark et al. [10]. While the objective of the study by Clark et al. [10] was to find genes that have experienced accelerated evolution on the human lineage, using the mouse as an outgroup, the aim of the current study is to find genes that have been targeted by positive selection at any point in time during the evolution of humans and chimpanzees, based on a larger set of genes. We use a likelihood ratio test to identify positive selection and do extensive simulations to find the appropriate critical values of the test. Positive selection is inferred if the ratio of nonsynonymous substitutions per nonsynonymous site to synonymous substitutions per synonymous site (dN /dS) is statistically significantly greater than one in a test of the neutral null hypothesis dN /dS = 1 [11,12]. The method used for detecting positive selection takes transition/transversion rate biases and unequal codon and amino acid frequencies into account. The test for positive selection applied in this study is a traditional test of dN /dS greater than one. It has more power than the test used in the Clark et al. study [10] if selection affects both the human and the chimpanzee lineages because it uses information from both lineages. Results Chimpanzee sequence was obtained by PCR using primers designed to flank exon sequence annotated in the human genome [10]. Our analysis begins with data from 20,361 coding regions, including 103,606 nucleotide differences and 403 indels among 17,687,331 aligned nucleotides. These numbers are significantly lower than the genome-wide averages [13,14], presumably due to selective constraints in the coding regions. The distributions of nonsynonymous and synonymous nucleotide differences among genes are shown in Figure 1. The average numbers of nonsynonymous and synonymous mutations per nucleotide site are 0.002578 and 0.003281, respectively. Eliminating reads without a hit to known genes in public databases (see Materials and Methods), there are 71,896 nucleotide differences in 13,731 genes. The remaining analysis is restricted to this set of genes. Among them, 5,574 were eliminated from the positive selection analysis because they had fewer than three mutations, and 797 were eliminated because the sequence was less than 50 bp long. Additionally, 45 genes were eliminated because they contained internal stop codons, presumably due to erroneous annotations or sequencing errors. Among the remaining 8,079 genes, 3,913 were also analyzed by Clark et al. [10]. The average level of sequence divergence was 0.60%, corresponding to a divergence level of 1.57% in silent sites. This figure matches well the level of divergence observed by Ebersberger et al. [14] for Chromosome 22 of 1.44% overall and 2.26% in CpG islands. Seven hundred thirty-three of the 8,079 genes evolved with dN /dS greater than one, but only 35 had p-values less than 0.05, as determined by a likelihood ratio test of the null hypothesis of dN /dS = 1 against the alternative hypothesis of dN /dS greater than one. The number of significant genes at the 5% level, in this one-sided test, is lower than the nominal level because the vast majority of genes are conserved and evolve with dN /dS less than one. Nonetheless, after using Simes's improved Bonferroni procedure [15] we can, at the 5% significance level, reject the hypothesis that none of the genes are evolving with dN /dS greater than one. This also implies that a 5% false discovery rate set is nonempty. Even though the level of divergence between humans and chimpanzees is very low, there is statistically significant evidence for positive selection in the DNA sequences of these two species. Results for all genes are available in Dataset S1. Biological Processes Affected by Positive Selection To identify functional groups of genes with an overrepresentation of putatively positively selected genes, we used the PANTHER [16,17] classification of biological processes and a Mann-Whitney U test (MWU) based on the p-values from the likelihood ratio test (Table 1). The classification based on the MWU identifies categories of genes with small p-values from the likelihood ratio test. It is important to notice that genes that evolve approximately neutrally will tend to have smaller p-values than genes evolving under strong functional constraints. The classification based on the MWUs, therefore, does not provide unambiguous evidence for positive selection, but it provides a key to which groups harbors the most candidates for positive selection. Immune-defense-related genes appear at the top of the list. It is not surprising that several of the genes experiencing most positive selection are involved in immune responses to viruses. Considering the speed at which many pathogens, such as viruses, evolve (e.g., [5]), a coevolutionary molecular arms race between pathogens and host cells might explain the presence of strong selection favoring new mutations in these genes. Other forces, including overdominant selection to diversify the spectrum of immune responses, may also cause positive selection in immune- and defense-related genes. Such explanations have previously been used to explain the presence of positive selection in the human major histocompatibility complex [18]. As in [10] we also identify genes involved in various forms of sensory perception, including olfaction and genes classified as “unknown biological function.” Many of the genes with unknown biological function show sequence similarity with known transcription factors (data not shown). Much of the selection on sensory genes is driven by the selection on olfactory receptors previously found by Gilad et al. [9]. In contrast to Clark et al. [10], we also find that genes involved in spermatogenesis appear to have an excess of positively selected genes. The genes involved in spermatogenesis showing the strongest evidence for positive selection include several KRAB-containing zinc finger proteins that serve as repressors of transcription and are believed to be involved in determining the differentiation of pluripotent stem cells [19]. Expression Patterns and Positive Selection We also categorized 3,464 of the 8,079 genes according to the tissue of expression in the Novartis Gene Expression Atlas [20]. Because of the relatively small number of tissue-selective genes in our dataset (204) and the large number of tissues analyzed (28), many tissues had fewer than 20 tissue-selective genes, providing little statistical power for further subdivision. Therefore, we examined instead whether the tissue of maximal expression for a gene was correlated with positive selection, since high expression levels and importance in tissue function are often, but not always, correlated. The set of genes that have their maximal expression in the testes is the only one showing an excess of positive selection, after a Bonferroni correction for multiple tests (Table 2). Genes with their maximal expression in the brain do not have an excess tendency toward positive selection. In fact, genes expressed in the brain seem to be among the most conserved genes with the least evidence for positive selection. MWUs, comparing genes with their maximal expression in the brain (83 genes) to all other genes, show that these genes tend to have significantly higher p-values of the likelihood ratio test for positive selection (p = 0.035), indicating high levels of selective constraint. Genes that are expressed in the brain at a level of twice the expression level found in blood show an even stronger tendency toward avoidance of positive selection (p = 0.0002). Although studies of gene expression in the brain tissue are complicated by low-abundance transcripts and heterogeneous specialized brain regions [21], the overall evidence points toward a deficiency of positively, or fast evolving, genes among those expressed in the brain. The causes for the cognitive differences may instead be sought in adaptive changes in just a few genes, in changes in gene expression [22], or in changes in copy number and/or organization of genes relating to cognitive function [23]. Dorus et al. [24] found that genes expressed in the nervous system showed a relative increase in the rate in primates relative to rodents when compared to housekeeping genes, but provided no direct evidence for positive selection on these genes. Nervous-system-specific genes appear to be so conserved that it is unlikely that direct evidence for positive selection will be discovered in this group of genes. Positive Selection in the X Chromosome We also tested if any chromosomes show an excess of genes with evidence for positive selection. The only chromosome enriched in genes with small p-values from the likelihood ratio test for positive selection is the X chromosome (p = 0.0049; MWU). Several factors influence the contrast between the X and autosomes in tests of selection, including hemizygosity of the X in males, resulting in more effective selection against deleterious recessive and in favor of positive recessive mutations [25]. Male hemizygosity also results in mutations, with male-specific effects being more readily fixed by selection on the X [26]. This increased efficiency of selection for male-specific genes on the X may explain the excess of X-linked genes expressed in spermatogonia [27]. The observation that reproductive proteins generally evolve at a greater rate, coupled with the overrepresentation of male-specific genes on the X, could produce the excess positive selection seen on the X. However, after eliminating all genes with highest expression levels in the testis, or annotated as functioning in spermatogenesis, there is still an excess of putatively positively selected genes on the X chromosome (p = 0.0131; MWU). Thus, it appears that the elevated positive selection on the X is likely due to the general tendency of mutations to be recessive, regardless of their tendency to be male-limited in expression. Although other factors, such as an elevated male mutation rate [28], differences in the efficacy of genetic hitchhiking between autosomes and the X chromosome [29], and correlations between recombination rate and divergence [30], may cause differences in variability and substitution rate between autosomes and the X chromosome, none of these factors alone can explain the excess of positively selected genes on the X chromosome. Analysis of the 50 Genes Showing Strongest Evidence for Selection We studied the 50 genes with the highest likelihood ratios in greater detail to further characterize the causes of positive selection and examine error rates (Table 3). To investigate the degree to which our results might be influenced by sequencing errors, we compared the data for these genes with the public data available for the same genes. In the regions with overlap between the public data and our data there were a total of 327 mutations in the public data and 306 mutations in our data. This demonstrates that there is not an excess of (potentially artifactual) mutations in our data in the genes that show evidence for positive selection. While most of the 50 genes also show strong evidence for positive selection in the public data, six of the genes do not. HC19953, HC2758, HC6579, HC7761, HC8067, and HC9844 do not have dN /dS ratios larger than one in the public data. In most cases, the difference is caused by the fact that our database and the public database contain different regions of the genes. Not all regions of a gene are expected to be targeted by positive selection, but this does not challenge the evidence for positive selection in the regions of the genes included in this analysis. In any case, using the public data would not change the qualitative conclusions of the analysis of the genes presented here. Immunity and Defense Genes Targeted by Positive Selection The top 50 genes include many genes that we might a priori expect to be targets of positive selection, including four genes involved in olfaction (OR2W1, OR5I1, OR2B2, and C20orf185) and several genes involved in host–pathogen interactions, such as CMRF35H, CD72 antigen, pre-T-cell antigen receptor α (PTCRA), APOBEC3F, and granzyme H (GZMH). Only one of these genes was among the 50 most significant entries in the Clark et al. [10] model 2 analysis. APOBEC3F encodes an antiviral factor that has previously been demonstrated to be under positive selection by Sawyer et al. [3] who note that this gene has been associated with anti-HIV activity. Presumably, most of these genes have been targeted by positive selection throughout the primate and mammalian phylogeny. The widespread evidence for positive selection in immune-related genes confirms the hypothesis that much positive selection in the human and mammalian genomes may be driven by a coevolutionary arms race between host immune system and pathogens. Spermatogenesis- and Apoptosis-Related Genes The list also contains many testis- or sperm-specific genes including Protamine-1 (PRM1), which previously has been shown to be under positive selection [31], possibly due to sperm competition (but see [32] for an alternative explanation). Other sperm-specific genes on the list include USP26, C15orf2, PEPP-2, TCP11, HYAL3, and TSARG1. The inclusion of these genes in the list of the genes showing the strongest evidence for positive selection is consistent with the results, based on the PANTHER annotation and the Novartis expression data, of excess positive selection in sperm/testis-specific genes. The possible causes include sperm competition (e.g., [31]), sexual conflict (e.g., [7,8]), selection for reproductive isolation, pathogen-driven selection in the reproductive organs, and selection related to the occurrence of mutations causing segregation distortion. We notice that at least one of these genes (TSARG1) is involved in apoptosis during spermatogenesis. Apoptosis of germ cells is conspicuous during normal spermatogenesis, eliminating up to 75% of the potential spermazoa [33–35], affecting cells both before and after the meiotic division [36]. It has been hypothesized that the main cause for the high rate of apoptosis during spermatogenesis is to maintain a proper cell-number ratio between maturing germ cells and Sertoli cells [35]. The natural process of elimination of germ cells by apoptosis creates a genomic conflict in which each individual germ cell will benefit from avoiding apoptosis, but apoptosis of a certain fraction of germ cells may be beneficial to the mature organism. New mutations occurring in cells during spermatogenesis, which reduces the probability of apoptosis, will be positively selected. This effect will be particularly strong for mutations in genes expressed after the meiotic division, potentially resulting in segregation distortion. A mutant with an even very small increase in the probability of escaping postmeiotic apoptosis will have a strong selective advantage. Compensatory mutations, reducing or eliminating the effect of the apoptosis avoidance mutation, may then later occur. These dynamics may lead to recurrent events of positive selection in genes affecting spermatogenesis apoptosis. The 40 genes in this study involved in inhibition of apoptosis show an excess of evidence for positive selection compared to other categories (p = 0.0047; see Table 2). Many of the genes showing most evidence for positive selection are known to be involved in either spermatogenesis, apoptosis, or both. For example, the apoptosis-related gene showing the strongest evidence for positive selection (DFFA) is an inhibitor of Fas-mediated apoptosis, which has been shown to be involved in apoptosis during spermatogenesis [36]. This may suggest that genomic conflict due to spermatogenesis apoptosis may be driving positive selection in many of the included genes. Cancer-Related Genes While we expected to find genes involved in olfaction, spermatogenesis, and immune defense among the 50 annotated genes showing the strongest evidence for positive selection, we were surprised to find a very large proportion of cancer-related genes, especially genes involved in tumor suppression, apoptosis, and cell cycle control. These genes include four putative tumor suppressors: HYAL3, DFFA, PEPP-2 (note that both HYAL3 and PEPP-2 also appear to be involved in spermatogenesis), and C16orf3, another gene associated with tumor progression (MMP26), and a gene with unknown function but high similarity to melanoma-associated antigens (FLJ32965). In addition, there are several genes involved in apoptosis (PPP1R15A, HSJ001348, TSARG1, and GZMH). Given that many of the genes have very little functional information, it is surprising to find such a large proportion of genes that may be related to tumor development and control. The factors causing positive selection on these genes are unknown, but genes important in tumor development and suppression may be positively selected due to other functional effects of the genes, particularly in immunity and defense or in spermatogenesis. Several of the genes involved in tumor suppression or progression show testis-specific expression, and models of genomic conflict may explain the presence of positive selection in these genes. It should be noted that there is no pattern of human-specific selection in these genes. The high number of nonsynonymous mutations in these genes is approximately evenly distributed between the human and the chimpanzee lineage (results not shown). PAML Analysis For each of the 50 genes, we searched public databases to find orthologous genes in other mammals. For 25 of the genes we were able to identify orthologs from mouse and rat, and for these 25 genes we estimated the dN /dS ratio of each lineage of the underlying phylogeny using PAML [37]. The dN /dS ratio was elevated (p if i is less than j. The polarity of the mutation was determined using the chimpanzee sequence as outgroup. Analysis of ascertainment bias. To assess the impact of the ascertainment scheme in the tests that contrast human polymorphism data to the human–chimp divergence, new datasets were simulated, using standard neutral coalescence simulations (e.g., [38]). Each simulated dataset generated one chimp sequence and 78 human sequences for each of the 13,731 genes. For each simulated gene, one human sequence was randomly chosen and compared to the chimp sequence using a chi-square statistic for the goodness-of-fit test of dN /dS = 1. The 50 genes with largest chi-square statistic among genes with dN /dS greater than one were selected for population genetic analysis. This scheme was repeated 1,000 times to investigate the effect of the ascertainment protocol of the 50 genes. The parameters of the simulations were estimated from the data, using the observed distribution of sequence lengths, and synonymous-site mutation rate and humans–chimp divergence time estimated from the concatenated data. The distribution of dN /dS ratios among genes was estimated assuming the dN /dS ratios follow a γ distribution among genes, keeping the synonymous rate constant among them. Power analysis. To analyze the power of the test for positive selection, we simulated pairs of sequences and performed likelihood ratio tests of H0: dN /dS equals one versus dN /dS is greater than one for each sequence pair. The simulations were done using the average value of synonymous sequence divergence observed in the data, while nonsynonymous divergence was varied. For more details regarding such simulations, see, e.g. [50]. PRF analysis. Assume nonlethal mutations enter a population of constant size 2N according to a Poisson process and are assigned to one of three categories: neutral (S = 0), positively selected with selection coefficient S +, and negatively selected with selection coefficient S –, according to probabilities p 0, p +, and p – (where p 0 + p + + p – = 1). Furthermore, assume mutations evolve independently. It follows from standard population genetic theory, the total law of probability, and the rules of conditional probability that the probability of an SNP being found at frequency i out of n chromosomes under this scheme [44] is where F(i,n,S) --> is given by The likelihood of observing counts x 1, x 2, . . ., xS where S is the total number of segregating sites out of n 1, n 2, …, nS chromosomes is, thus, The maximum likelihood value and the maximum likelihood parameter estimates can then be obtained by numerically maximizing this function with respect to the parameters. Likelihood ratio tests can be constructed by constraining certain of the parameters to take on particular values. For example, setting p 0 = 1 defines a model with no selected mutations. Likewise, setting p 0 + p – = 1 defines a model that allows negative selection, but no positive selection. This analysis assumes that mutations are independent. Because of linkage and the possibility of epistasis, the independence assumption may not be met by the data. However, a full analysis taking the correlation among SNPs into account is not computationally feasible. Fortunately, the average correlation is low between SNPs because they have been sampled among 50 genes distributed throughout the genome. The effect of the correlation among SNPs on this analysis should, therefore, be minimal. The maximum log likelihood value for the full model is –234.19. However, the maximum log likelihood values for models assuming only neutral mutations, or a single class of selected mutations, are –243.82 and –240.88, respectively. Under the independence assumption, both of these simpler models can be rejected against the model with three classes of mutations, using a likelihood ratio test (p = 0.0006 and p = 0.004). Supporting Information Dataset S1 Results File (3.1 MB XLS). Click here for additional data file. Dataset S2 Alignment File (9.8 MB ZIP). Click here for additional data file. Accession Numbers The sequence analyzed in this study has been submitted to GenBank (http://www.ncbi.nlm.nih.gov/Genbank/).
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Genome flux and stasis in a five millennium transect of European prehistory

            Thanks to the development of high-throughput sequencing techniques within the last decade, ancient human genomes have become accessible and now form an exciting resource that allows the testing of archaeological hypotheses in situ. However, sample preservation still represents a substantial challenge, particularly the typically low fraction of endogenous DNA within the overall recovered sequence data1 2 3. The potential of ancient genomes to shed new light on European human prehistory is illustrated by those individuals whose complete or partial autosomal genomes have been determined to date1 2 4 5 6 7 8. Nineteen of these samples are hunter-gatherers, while only two complete and four partial ancient farmers’ genomes (from Tyrol, Germany and Sweden) have been sequenced to date2 5 6 8. Although no diachronic series has yet investigated temporal genome-wide dynamics within a defined European region, in prior analyses hunter-gatherers’ genomes have fallen outside the range of modern European variation, while farmers’ samples showed an affinity to Southern Europeans, particularly present-day Sardinians. The Great Hungarian Plain, situated between Mediterranean and temperate Europe, was throughout prehistory a place of cultural and technological transformations as well as a major meeting point of Eastern and Western European cultures9. Farming began in this region with the Early Neolithic Körös culture, 6,000–5,500 cal BC, which is part of the Early Neolithic of Southeast Europe10 11 12, followed ~5,500 cal BC by the Middle Neolithic Linearbandkeramik (LBK) culture that consisted of two synchronous regional groups: the Alföld Linear Pottery (ALP, also Bükk) culture13 14 and the Transdanubian LBK variant in West Hungary15, which later dispersed agriculture into Central Europe and became the dominant farming culture of Europe. Locally, it developed into the Late Neolithic (ca. 5,000–4,500 cal BC) Lengyel culture. In the Great Hungarian Plain, there is continuity in material culture and settlements between the Late Neolithic and the Copper Age Baden Culture. However, during the Early Bronze Age (2,800–1,800 cal BC), growing demand for metal ores throughout Europe gave rise to new pan-European and intercontinental trading networks16. The Early Bronze Age cultures of the Great Hungarian Plain incorporated technology, settlement type and material cultural elements from the contemporaneous Bronze Age cultures of the Near East, Steppe and Central Europe. Finally, during the early phase of the Iron Age (first millennium BC), a variant of the Central European Hallstatt culture inhabited Transdanubia, whereas pre-Scythian (‘Mezőcsát communities’ of unknown origin) and later Scythian cultures prevailed further East on the Great Hungarian Plain. A compelling question is whether these major prehistoric transitions involved exogenous population influxes. Particularly, in the transition to agriculture in this gateway of the European Neolithic, what level of interaction and intermarriage may have occurred between local hunter-gatherer and non-local farmers? Archaeological evidence for the presence of Mesolithic hunter-gatherers in Southeast Europe is limited to a few small regions9 while a greater Mesolithic presence can be documented for parts of Northern Hungary and further northwards. Here we assess the imprint of this series of major cultural and technological shifts on the genomes of Central European prehistory through the analysis of a 5,000-year temporal transect of complete and partial genomes of individuals from archaeological sites in the Great Hungarian Plain. Results The petrous bone and differential DNA yields Although the advantages of genome-wide analysis are numerous, such data have not been routinely accessible due to the typically low endogenous DNA content in human bones in most archaeological contexts1 2 3. We compared endogenous DNA content from the petrous portion of the temporal bone, the densest bone in the mammalian body17, and paired alternate skeletal elements from six Hungarian skeletons sampled across diverse time depths (Fig. 1 and Supplementary Table 2). The endogenous DNA yields from the petrous samples exceeded those from the teeth by 4- to 16-fold and those from other bones up to 183-fold. Thus, while other skeletal elements yielded human, non-clonal DNA contents ranging from 0.3 to 20.7%, the levels for petrous bones ranged from 37.4 to 85.4% (Fig. 1). We extended this sampling to a further seven petrous bones from Hungary and yields of endogenous DNA remained exceptionally and consistently high (Supplementary Table 3). Overall sequencing results and contamination controls To investigate temporal genome-wide dynamics we sequenced these 13 ancient individuals to one of the three levels of genome coverage: a Neolithic (5,070–5,310 cal BC) and a Bronze Age (1,110–1,270 cal BC) library were sequenced to high coverage (22.1 × and 21.3 × mean coverage, respectively); seven samples were sequenced to ~1 × coverage; and a further four were sequenced to ~0.1 × , resulting in a ~5,000 year genomic transect from the onset of agriculture in this region during the Early Neolithic Körös period to the pre-Scythian Iron Age (Table 1). We strictly assessed the authenticity of the data. The sequence reads show damage patterns consistent with ancient DNA (Supplementary Fig. 3) and replicate extractions and library preparations in a separate laboratory gave extremely high genotype concordance (>99%) for the two high-coverage genomes (Supplementary Methods). Further, contamination estimates (mean±s.d.) derived from negative controls (0.14%±0.16), mitochondrial DNA (mtDNA; 0.15%±0.24) and X chromosome polymorphisms in males (0.63%±0.31) were consistently low (Supplementary Methods). Temporal dynamics of genomic affinity All analyzed individuals are from defined archaeological contexts from the onset of agriculture in this region during the Early Neolithic Körös period through to the pre-Scythian Iron Age and were directly radiocarbon dated (Table 1). To assess genomic continuity versus change in this time series and to determine how these ancient genomes relate to each other, to other ancient European genomes and to modern-day human populations, we generated a two-dimensional summary of autosomal genomic variation using principal components analysis (PCA) and combined our observed genotype data with published ancient sequences and genotypes of 552 modern individuals from Europe, Caucasus and the Near East1 2 6 18 19 20. Individual plots, similar in resolution to those observed in analyses of full modern single nucleotide polymorphism (SNP) data sets, were subsequently combined into a single plot using Procrustes transformation, following the study by Skoglund et al. 2 This analysis shows clear shifts (including two within multi-phased archaeological sites) in the genomic affinities of the ancient genotypes coinciding with cultural shifts and bracketing a 2,800 year period of Neolithic stasis. Although our sampling is concentrated in the first millennium of this interval, to place a particular emphasis on the Neolithisation process in Southeast/Central Europe, it was constructed to include material from the diverse archaeological phases within the Hungarian Neolithic. Based on these analyses, our samples can be divided into four sets that are located in different regions of the PCA. Our oldest sample, Körös Neolithic (KO1) (5,650–5,780 cal BC) was excavated from a short-lived agricultural settlement, perhaps spanning only two generations, at the northern range limit of the first Neolithic (Körös) cultural complex in Southeast Europe21. Despite its early Neolithic farming context, this genome falls towards the hunter-gatherer vicinity of the PCA plot (Fig. 2). In contrast, sample KO2, which is contemporaneous to KO1 (5,570–5,710 cal BC) and also from a Neolithic Körös Culture site only ~70 km distant, clusters with later Neolithic individuals (Fig. 2 and Supplementary Fig. 1).This marked genomic dichotomy between KO1 and KO2 suggests direct contact between indigenous hunter-gatherers and Neolithic communities as suggested previously2 8 22. The outlying Neolithic individual, KO1, was a blue-eyed male (Fig. 3) and his Y-chromosome lineage, I2a, matches the only haplogroup reported to date in Mesolithic Central and Northern Europeans5 8 (n=6). Our Neolithic genomes all cluster with affinity to Southern Mediterranean individuals, particularly Sardinians, echoing the results of previous direct analyses of European Neolithic and post-Neolithic genomes2 6 8. This affinity persists through nine successive time points in our data, including a diversity of Neolithic cultures. In contrast, we observe high mtDNA diversity during this period, as previously observed in Central Europe23. Affinities of our observed Y-chromosome lineages (I2 and C6 haplogroups, Table 1) with a Mesolithic background5 7 and our mtDNA haplogroups with farming communities (especially the N1a haplogroup, Table 1)24 tentatively support the incorporation of local male hunter-gatherers into farming communities during the Central European Neolithic (Table 1), in contrast to the male-dominated diffusion of farmers suggested for the Mediterranean route25. The genomic stasis of the Neolithic is subsequently interrupted during the third millennium BC coinciding with the onset of the Bronze Age. Our two Bronze Age samples, BR1 (1,980–2,190 cal BC) and BR2 (1,110–1,270 cal BC) fall among modern Central European genotypes. Within this period the trade in commodities across Europe increased and the importance of the investigated region as a node is indicated by the growth of heavily fortified settlements in the vicinities of the Carpathian valleys and passes linking North and South26. These two Bronze Age genomes represent the oldest genomic data sampled to date with clear Central European affinities. A third genomic shift occurs around the turn of the first millennium BC. The single Iron Age genome, sampled from the pre-Scythian Mezőcsát Culture (Iron Age (IR1), 830–980 cal BC), shows a distinct shift towards Eastern Eurasian genotypes, specifically in the direction of several Caucasus population samples within the reference data set. This result, supported by mtDNA and Y-chromosome haplogroups (N and G2a1, respectively, both with Asian affinities) suggests genomic influences from the East. This is supported by the archaeological record which indicates increased technological and typological affinities with Steppe cultures at this time, including the importation of horse riding, carts, chariots and metallurgical techniques26. Modern Hungarians occupy an intermediate position between the IR1 and more Western Bronze Age genomes, most likely reflecting the continuation of admixture in the Central European gene pool since this time. Imputation of ancient genomes The information content of low-coverage genome sequences may be leveraged using imputation with a phased reference panel to achieve genome-wide diploid genotypes and enable richer data analyses27. To test this approach in the context of palaeogenomic data we used the 1,000 Genomes Project phased reference data to impute 5,309 and 6,159 well-characterized SNP genotypes on chromosome 22 from a range of downsampled coverage levels of NE1 and BR2, respectively, and compared calls with those made directly from their full genome data (Supplementary Methods)27 28. Considering an imputed 1 × sample of NE1 and imposing a genotype probability threshold of 0.99, 78% of these loci remained, with diminishing return from increasing coverage in the subsample (Supplementary Fig. 5). Of the imputed genotypes, 99.20% (99.18% of heterozygotes) matched the observed high-coverage calls, validating this approach for expanding our data (Supplementary Figs 5–9). With the more recent BR2 genome, imputation from 1 × coverage allowed 80.0% of loci to be called at a 0.99 thereshold with 99.33% (99.05% of heterozygotes) match to high-coverage calls. Therefore, we imputed genome-wide genotypes for each of the low-coverage genomes and, after intersection with modern SNP data, called a total of 151,407 high-quality diploid loci across all samples. These data were used in an ADMIXTURE analysis29, which reaffirmed the clustering and temporal shifts in affinity observed in the PCA visualization (Fig. 4). These data also allowed us to estimate the fraction of each genome under runs of homozygosity30 (Fig. 5), which gives information about past demography. Long contiguous homozygous segments within a genome are indicative of recent endogamy while shorter runs result from older manifestations of small ancestral population size31. Unlike a small subset of modern genomes, no ancient genome in our analysis showed a clear excess of long runs of homozygosity (ROH) (>1.6 Mb), suggesting each to have been outbred. However, a clear temporal trend was evident in the extent of ROH, especially shorter ROH, with the Bronze Age and IR1 individuals falling within the bulk of modern values, Neolithic specimens tending towards the upper end of this range and the Early Neolithic Körös specimen, KO1 forming a clear outlier. This suggests an unusually restricted ancestral population size for KO1 (we note that low heterozygosity was also found within a 8,000 year old hunter-gatherer from Luxembourg5), supporting the inference that he represents an exogenous individual in a farming settlement. A possible criticism of this approach is that bias against heterozygous calls and the existence of ancient haplotypes that are absent from the reference genomes may impede analyses built on imputed data; our high-coverage ancient Neolithic genome, NE1, falls outside PCA clusters of the 1,000 Genomes reference individuals, along with our KO1 and IR1 samples (Supplementary Fig. 8). Therefore, we further analyzed genome-wide imputations of 0.5 × , 1 × and 2.5 × coverage samples of the NE1 and BR2 genomes. In both PCA and ROH plots of genotypes from these, the positions from each replicate were highly similar to those generated from high-coverage SNP calls (Supplementary Figs 6 and 7). Genotypes under selection Imputation permitted us to follow the temporal dynamics of genetic variants that are believed to have been under selection. Of two skin pigmentation loci known to have swept to fixation during European prehistory32 33, the light pigmentary variant of SLC24A5 is present from the earliest of our samples and is homozygous from the Middle Neolithic onwards, whereas the light pigmentary variant of SLC45A2 only appears towards the later half of our transect with the first homozygote genotype in the Copper Age (Fig. 3). Both SLC24A5 and SLC45A2 exhibited an ancestral homozygous state in Mesolithic specimens of Central5 and Western Europe7, while SLC24A5 had the derived state in a Central European Neolithic individual5. Our temporal transect suggests separate selective sweeps at these two pigmentary loci, acting over a millennium apart. The selected variant at a third pigmentary locus with a proposed adaptive history in Europe, TYRP1, also shows some tendency to higher prevalence in later samples. This temporal transition towards lighter pigmentation is also seen with hair where colours and shades estimated from SNPs used in the forensic Hirisplex system grade from black/dark brown in earlier samples to light brown and dark blonde in later individuals (Fig. 3). One of the strongest signals of selection within human genome variation is that around the lactase persistence allele in Europeans; a response to a dietary focus on raw milk from domestic cattle. It has been postulated that this allele first underwent selection 5,500 years BC, possibly in association with the Neolithic LBK culture within Central Europe34. Here in our temporal sequence, its appearance is delayed until the more recent of our Bronze Age individuals, who lived only ~1,000 years BC. Discussion The extension of population genomics into the temporal dimension is an exciting recent development in the field of human evolution but the low endogenous DNA content of most archaeological bones is a major constraint, even with falling sequencing costs, accessing whole genomes from samples comprising 50% are typical from extractions of the petrous portion of the skull temporal bone. Where tested, this contrasts significantly with yields from other skeletal parts from the same individual, despite similar taphonomic conditions. We suggest that the high density17 of the petrous bone results in reduced bacterial and chemical-mediated post-mortem DNA decay. We also show that, at least for Europeans, imputation of 1 × genome coverage sequences can give genome-wide diploid calls for ~80% of genome-wide SNPs, at ~99% accuracy, greatly leveraging their information content. These data can be used to examine SNPs of particular phenotypic interest and make whole genome analyses such as examination of ROH, ADMIXTURE and PCA analysis possible. It is important to note, though, that other methods may be sensitive to biases among SNPs as to which are imputable and that samples of higher divergence from the reference populations may impute with lesser accuracy. Genome-wide imputation offered the opportunity to assess phenotypic change through time from low-coverage genomes. Our samples show a tendency towards lighter pigmentation through and after the Neolithic. In particular we examined three pigmentation SNPs that display European-specific selective sweeps that are presumed to facilitate vitamin D synthesis and estimated as having occurred within the last 11,000–19,000 years33. We surmise that these sweeps occurred more recently, within the time depth of our transect, with SLC24A5 showing the earliest fixation (~5,000 BC), while SLC45A2 and TYRP1 were not found in homozygous individuals until the Late Neolithic (~4,000–3,000 BC). Wilde et al. 32 also found intermediate frequencies for SLC45A2 in ancient Ukranian Eneolithic and Early Bronze Age samples. The strongest dietary adaptive signal in the human genome is the highly structured global distribution and extended homozygosity around the lactase persistence allele in European genomes35. Selection on this variant was undoubtedly driven by dairying, but despite evidence for milk residues in ceramic vessels from a Körös context in the 6th millenium BC (ref. 36) this variant remains absent throughout the 10 Neolithic/Copper Age stages of our transect. Absence of the lactase persistence allele has been reported before from Neolithic specimens37 38, although the selective sweep has been modelled as originating between Central Europe and the Balkans ~4–6,000 years BC (ref. 34). Its absence here until the late Bronze Age, ~1,000 years BC, suggests a more recent dating of this extremely interesting episode in the dynamic history of European genomes. Beyond inferences about individual phenotypes, we have used our results to examine the population genetic affinities of a temporal transect of genome sequences from burials on the Great Hungarian Plain, a region of high archaeological significance for major European cultural transitions. We investigated samples across a diversity of archaeological cultures and show evidence for major shifts in genome affinity accompanying the advents of the Neolithic, Bronze and Iron Ages, strongly implying that these changes in material culture were accompanied by substantial migrations. The Neolithic genomes reported here accord with prior German, Scandinavian and Alpine early farmer genomes in showing an immigrant signature of Southern Mediterranean affinity2 5 6 8. However, an intriguing finding is that of a single individual with a strongly Mesolithic genomic signature within the context of the Körös culture, part of the earliest Neolithic of Southern Europe. This is the earliest genetic indication of contact between these two subsistence strategies. In the Middle and Late Hungarian Neolithic local Mesolithic influence is further discernible through the appearance of mtDNA and Y-chromosome haplogroups typical of European hunter-gatherer populations, concurring with other evidence for admixture in the ancestry of European farmers5 8 22 23. Similar to the Tyrolean Copper Age iceman6 our Copper Age (Baden Culture) sample shows similarity to Neolithic genomes, in accordance with archaeological continuity in the region. In contrast, the Bronze Age genomes shift towards an affinity to Central Europe, suggesting migratory influence from the North. The single pre-Scythian IR1 genome shows another shift towards migration from the East. Altogether, our results accord with archaeological perspectives that link these major transitions in European material culture to population movements rather than cultural diffusion alone. Methods Samples We analyzed 23 samples belonging to 13 ancient individuals from Eastern Hungary (Table 1; Supplementary Note 1). All individuals have been directly dated, spanning from the Early Neolithic (~5,700 cal BC) to the IR1 (~800 cal BC) (Supplementary Fig. 1; Supplementary Table 1; Supplementary Note 1). For 7 of the 13 individuals investigated (Supplementary Fig. 2; Supplementary Table 2; Supplementary Methods) we compared dental crowns (IDs 8.2, 14.4, 14.7) and roots (IDs 8.3, 14.5, 14.8), ribs (IDs 8.5, 10.2), metacarpal (ID 10.6) and metatarsal portions (ID 10.4) to petrous bone parts within the same individual (Fig. 1; Supplementary Table 2) and for individual NE6 we investigated two different areas of the same temporal bone (IDs 14.2, 14.3). Laboratory procedures DNA extraction was carried out with ~300 mg of bone powder (Supplementary Table 3) and following a silica-column-based protocol based on the study by Yang et al. 39, as modified by MacHugh et al. 40 (Supplementary Methods). Libraries were constructed following the study by Meyer et al. 41 with few modifications (Supplementary Methods). Indexing PCRs were performed using Accuprime Pfx Supermix (Life Technology), purified using QIAGEN silica columns and the quality was assessed on an Agilent 2100 Bioanalyzer. All libraries were first screened on an Illumina MiSeq platform (50 bp single-end) and libraries from petrous bones were further sequenced on an Illumina HiSeq 2000 (100 bp single-end sequencing) (Supplementary Tables 2 and 3). Genome mapping and SNP calling Raw reads were filtered based on the indices used and the adapter sequences were trimmed using cutadapt v1.3 (ref. 42) (Supplementary Methods). Two bases of the reads at the 5′ and 3′ ends were trimmed before mapping, following the study by Meyer et al. 43, using seqtk ( https://github.com/lh3/seqtk), for a final minimum length of 30 bp. Sequence reads were aligned independently to the nuclear DNA (GRCh37) and mtDNA (rCRS, NC_012920.1) using Burrows–Wheeler Aligner44 disabling the seed option. Duplicate reads were removed using Samtools45 and indels were realigned using Genome Analysis Toolkit (GATK) RealignerTargetCreator and IndelRealigner46. Genomic depth of coverage was calculated using depth-cover ( https://github.com/jalvz/depth-cover). BAM files obtained from different sequencing lanes were merged using the MergeSamFiles Picard tool ( http://picard.sourceforge.net). Duplicates were further removed and filtered by using mapping quality >30 (Supplementary Methods; Supplementary Table 3; for mtDNA see Supplementary Methods). To explore the ancient data in the context of modern variation we called genotypes at all positions that overlapped with the HGDP+ data set (Supplementary Methods), which includes European, Caucasian and Near Eastern populations18 19 20. Different SNP calling procedures were followed for the high-coverage and low-coverage data (Supplementary Methods). Authenticity of results All stages of the genetic analysis, up to the library amplification set-up, were carried out in dedicated ancient DNA facilities at Trinity College Dublin, Ireland. Standard precautions to avoid contamination were taken, including wearing coveralls, mask, hair cover, shoe covers and double gloves. Working surfaces and all materials were frequently cleaned with DNA-ExitusPlus and subsequently ultraviolet-irradiated. The high percentages of human DNA point to very good preservation in all samples. Nevertheless, contamination was further controlled and estimated by: (a) independent replication, (b) sequencing of negative controls, (c) estimation of molecular damage and sequence length, (d) mtDNA contamination estimates and (e) X chromosome contamination estimates in males (Supplementary Methods). Sex determination and uniparental ancestry Sex was determined by analyzing the ratio of X- to Y-chromosome reads following the study by Skoglund et al. 47 (Supplementary Methods, Supplementary Fig. 4). Samples were mapped to the mtDNA genome and filtered as above. BAM files were analyzed with the online tool MitoBamAnnotator48 (Supplementary Methods). Haplogroups were obtained with Haplogrep49 based on the build 15 phylogeny of PhyloTree50 (Supplementary Table 12). Y-chromosome haplogroups were determined for all male samples by analyzing the filtered SNP set of the International Society of Genetic Genealogy Y-DNA Haplogroup Tree 2014 (ISOGG, Version 9.22, Supplementary Methods; Supplementary Table 13). Imputation We evaluated the potential of whole genome imputation to infer diploid genotypes from low-coverage genomes by randomly subsampling our high-coverage genomes (NE1 and BR2) to a series of coverages ranging from 0.1 to 5 × using SAMtools45 (Supplementary Methods; Supplementary Fig. 5). Using GATK’s UnifiedGenotyper tool, genotype likelihoods were called from subsampled sequence data at 28,627,866 autosomal SNP sites with a minor allele count >1 in the phase 1 integrated release of the 1,000 Genomes Project28. Genotype likelihoods were called only for the alleles observed in the 1,000 Genomes Project data and were converted from log space to linear space and parsed to Beagle format, assigning equal likelihoods (0.3333) to sites with no spanning sequence data. Genotype likelihoods were also reset to 0.3333 for any genotype that could have derived from a deaminated cytosine residue. We used data from these studies to inform our choice of target depth for our low-coverage genomes, then applied the same imputation methodologies to these real data sets (11 individuals; coverage 0.1–1 × ) to infer accurate diploid genotype calls. For genome-wide analyses (PCA and ADMIXTURE), genotype probabilities were converted to PLINK-format BED data, imposing a genotype probability threshold of 0.99. For single-locus analysis of selective sweeps, a genotype probability threshold of 0.85 was used (Supplementary Methods). PCA and ADMIXTURE We performed individual PCA in the context of the HGDP+ (Supplemetaty Methods) data set on all observed genotyping of the 13 samples using SMARTPCA (refs 2, 51) (Supplementary Methods). We used a Procrustes approach as in the study by Skoglund et al. 2 to transform the PCA coordinates of each sample and to plot them together in the context of the HGDP+ data set (Fig. 2; Supplementary Tables 4 and 5). PCA was also performed on 1 × imputed data with a probability cutoff of 99% (151,407 shared SNPs, Supplementary Fig. 9; Supplementary Methods). ADMIXTURE (ref. 29) and NgsAdmix (ref. 52) were used to estimate the ancestral genetic components of nine ancient samples (seven ~1 × and two 1 × downsampled ~20 × genomes) together with 552 modern samples after filtering the SNP data set on imputed and observed genotypes, respectively (Supplementary Methods). Runs of homozygosity ROH analysis was performed with nine ancient imputed genomes, two of them imputed from high-coverage genomes (samples NE1 and BR2) subsampled to ~1 × and seven originally ~1 × genomes (samples KO1, NE5, NE6, NE7, CO1, BR1 and IR1). Imputed genotypes were called if their genotype probability was ≥0.99. PLINK (ref. 53) was used to merge these imputed genotypes with the HGDP+ data set (Supplementary Methods), giving a total of 151,407 SNPs shared between all samples. We also estimated the correlation between sample age and the total length of ROH (Fig. 5a; Supplementary Methods). Selective sweeps and phenotypes We examined SNPs in four genes implicated in pigmentation and which have been suggested to have undergone selective sweeps in European prehistory (Fig. 3). Imputed genotypes from each low-coverage sequenced ancient individual were considered along with directly observed alleles for the two high-coverage sequences. Genotypes were imputed for three pigmentation genes (SLC24A5, SLC45A2, associated with skin colour, and TYRP1, associated with iris and hair pigmentation) in which three SNPs (rs1426654, rs16891982 and rs2733831, respectively) have been specifically selected in Europeans, and the lactase persistence gene in Europeans associated with the T allele at SNP rs4988235 (Supplementary Methods). We also implemented two models for phenotype prediction developed in forensic science for our pool of ancient genomes (Supplementary Methods): the 8-plex54 55 and the Hirisplex56 systems. We used the former to infer the pigmentation of skin and the latter for eye and hair colour prediction. Author contributions R.P., M.H. and D.G.B. supervised the study. C.G., E.R.J., M.D.T., R.L.M., G.G.-F. and D.G.B. analyzed genetic data. R.P., L.D., I.K., I.P., A.A., J.D., P.R. provided archaeological samples and input about the archaeological context, R.P., T.F.G.H., A.W. and L.D. provided and analyzed radiocarbon determinations, C.G., E.R.J. and G.G.-F. processed ancient DNA and prepared sequencing libraries. V.M. conducted sequencing. C.G., R.P. and D.G.B. wrote the manuscript with contributions from all co-authors. Additional information Accession codes: Raw alignment data have been deposited in GenBank/EMBL/DDBJ Sequence Read Archive (SRA) under the accession code SRP039766. How to cite this article: Gamba, C. et al. Genome flux and stasis in a five millennium transect of European prehistory. Nat. Commun. 5:5257 doi: 10.1038/ncomms6257 (2014). Supplementary Material Supplementary Information Supplementary Figures 1-10, Supplementary Tables 1-17, Supplementary Notes, Supplementary Methods and Supplementary References
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Patterns and processes in crop domestication: an historical review and quantitative analysis of 203 global food crops.

              Domesticated food crops are derived from a phylogenetically diverse assemblage of wild ancestors through artificial selection for different traits. Our understanding of domestication, however, is based upon a subset of well-studied 'model' crops, many of them from the Poaceae family. Here, we investigate domestication traits and theories using a broader range of crops. We reviewed domestication information (e.g. center of domestication, plant traits, wild ancestors, domestication dates, domestication traits, early and current uses) for 203 major and minor food crops. Compiled data were used to test classic and contemporary theories in crop domestication. Many typical features of domestication associated with model crops, including changes in ploidy level, loss of shattering, multiple origins, and domestication outside the native range, are less common within this broader dataset. In addition, there are strong spatial and temporal trends in our dataset. The overall time required to domesticate a species has decreased since the earliest domestication events. The frequencies of some domestication syndrome traits (e.g. nonshattering) have decreased over time, while others (e.g. changes to secondary metabolites) have increased. We discuss the influences of the ecological, evolutionary, cultural and technological factors that make domestication a dynamic and ongoing process. © 2012 The Authors. New Phytologist © 2012 New Phytologist Trust.
                Bookmark

                Author and article information

                Contributors
                jerome.salse@inra.fr
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                11 February 2019
                11 February 2019
                2019
                : 20
                : 29
                Affiliations
                [1 ]INRA-UCA UMR 1095 Génétique Diversité et Ecophysiologie des Céréales, 63100 Clermont-Ferrand, France
                [2 ]Laboratoire d’Anthropobiologie Moléculaire et d’Imagerie de Synthèse, CNRS UMR 5288, allées Jules Guesde, Bâtiment A, 31000 Toulouse, France
                [3 ]ISNI 0000 0001 2106 639X, GRID grid.412041.2, INRA-Université Bordeaux UMR1202, Biodiversité Gènes et Communautés, ; 33610 Cestas, France
                [4 ]ISNI 0000 0001 0674 042X, GRID grid.5254.6, Centre for GeoGenetics, Natural History Museum of Denmark, ; Øster Voldgade, 1350K Copenhagen, Denmark
                Article
                1627
                10.1186/s13059-019-1627-1
                6369560
                30744646
                5edf350b-a773-43d7-aeeb-68bc90819204
                © The Author(s). 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                Categories
                Opinion
                Custom metadata
                © The Author(s) 2019

                Genetics
                Genetics

                Comments

                Comment on this article