83
views
0
recommends
+1 Recommend
0 collections
    8
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Reproductive Mode and the Evolution of Genome Size and Structure in Caenorhabditis Nematodes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The self-fertile nematode worms Caenorhabditis elegans, C. briggsae, and C. tropicalis evolved independently from outcrossing male-female ancestors and have genomes 20-40% smaller than closely related outcrossing relatives. This pattern of smaller genomes for selfing species and larger genomes for closely related outcrossing species is also seen in plants. We use comparative genomics, including the first high quality genome assembly for an outcrossing member of the genus ( C. remanei) to test several hypotheses for the evolution of genome reduction under a change in mating system. Unlike plants, it does not appear that reductions in the number of repetitive elements, such as transposable elements, are an important contributor to the change in genome size. Instead, all functional genomic categories are lost in approximately equal proportions. Theory predicts that self-fertilization should equalize the effective population size, as well as the resulting effects of genetic drift, between the X chromosome and autosomes. Contrary to this, we find that the self-fertile C. briggsae and C. elegans have larger intergenic spaces and larger protein-coding genes on the X chromosome when compared to autosomes, while C. remanei actually has smaller introns on the X chromosome than either self-reproducing species. Rather than being driven by mutational biases and/or genetic drift caused by a reduction in effective population size under self reproduction, changes in genome size in this group of nematodes appear to be caused by genome-wide patterns of gene loss, most likely generated by genomic adaptation to self reproduction per se.

          Author Summary

          Closely related species can vary widely in genome size, yet the genetic and evolutionary forces responsible for these differences are poorly understood. Among Caenorhabditis nematodes, self-fertilizing species have genomes 20–40% smaller than outcrossing species. Constructing a high quality de novo genome assembly in C. remanei, we find that this outcrossing species has many more protein coding genes than the self-fertilizing Caenorhabditis. Intergenic spaces are larger on the X chromosome and smaller on autosomes for both selfing and outcrossing Caenorhabditis, but protein-coding genes are larger on the X chromosome in the self-fertile C. briggsae and C. elegans and larger on autosomes in the outcrossing C. remanei. This contrasting pattern of contracting genomes and expanding genes is likely mediated by changes in the balance between genetic drift and natural selection accompanying the transition to self-fertilization.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: found
          • Article: not found

          The origins of genome complexity.

          Complete genomic sequences from diverse phylogenetic lineages reveal notable increases in genome complexity from prokaryotes to multicellular eukaryotes. The changes include gradual increases in gene number, resulting from the retention of duplicate genes, and more abrupt increases in the abundance of spliceosomal introns and mobile genetic elements. We argue that many of these modifications emerged passively in response to the long-term population-size reductions that accompanied increases in organism size. According to this model, much of the restructuring of eukaryotic genomes was initiated by nonadaptive processes, and this in turn provided novel substrates for the secondary evolution of phenotypic complexity by natural selection. The enormous long-term effective population sizes of prokaryotes may impose a substantial barrier to the evolution of complex genomes and morphologies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Recombinational Landscape and Population Genomics of Caenorhabditis elegans

            Introduction The allelic variants that underlie heritable phenotypic variation are distributed along chromosomes. Their distribution is shaped by the machinery of meiosis within individuals and by mutation, selection, and drift among them. To discover the genetic basis of complex traits, and to understand the evolutionary dynamics that shape this genetic architecture, we must characterize empirical patterns of linkage and linkage disequilibrium. We have undertaken this task in the nematode C. elegans. Mapping of thousands of mutants to the genome and molecular studies of meiotic machinery have provided a view of the large-scale landscape of the C. elegans recombination map. The chromosomes exhibit nearly complete crossover interference [1], such that each chromosome experiences one crossover per meiosis and has a genetic length of 50 cM [2]. Accumulated data from thousands of two- and three-point mapping crosses and small-scale SNP-based analyses have demonstrated a general pattern of large, nearly constant-rate domains on the autosomes, with high recombination in chromosome arms and low recombination in chromosome centers. Despite strong global regulation of crossover number, many details remain unclear, including the locations of the domain boundaries, the occurrence of fine-scale variation within domains, and the existence of domain structure on the X chromosome. Moreover, evidence for the genetic control of crossover number and position [1]–[4] leaves open the possibility that segregating variants may influence recombination patterns in experimental crosses of natural isolates. Because recombination patterns have been studied only on broad scales in individual crosses, involving fewer than two dozen markers per chromosome, dense characterization of a massive cross promises to clarify the recombinational landscape. C. elegans is one of the most exhaustively studied of all species with respect to developmental, behavioral, and physiological genomics, but studies of its population biology have lagged. Although natural genetic variation has been a source of alleles for genetic analysis in C. elegans since long before the system became a model [5], the widely accepted notion that worms exhibit little variation has discouraged investigations of their diversity. The difficulty of collecting C. elegans from the wild has compounded the problem. Nevertheless, recent work has revealed abundant heritable phenotypic variation among wild C. elegans strains [6]–[20] and has begun to reveal the ecological context for this species [16], [17], [21]–[25]. C. elegans geneticists have exploited this variation to map quantitative trait loci [26]–[37], and in a handful of cases to identify the causal mutations underlying phenotypic variation (in genes npr-1, mab-23, tra-3, zeel-1, plg-1, and scd-2 [10], [30], [38]–[43]). In parallel, studies of variation at molecular markers have begun to provide an account of the distribution of genetic variation within and among localities and across genomic regions [6], [7], [23], [24], [40], [41], [43]–[60]. These studies have shown that the species exhibits substantially lower levels of polymorphism and higher levels of linkage disequilibrium than other model systems, even those, like Arabidopsis thaliana, that share with C. elegans a primarily selfing mating system. The empirical pattern of linkage disequilibrium may result as much from selection against recombinant genotypes as from attributes of population biology such as population size and outcrossing rate [24],[61]. A genome-wide assessment of linkage disequilibrium is required to determine whether natural isolates of C. elegans will be useful for mapping loci by association. We generated and genetically characterized a recombinant inbred advanced intercross population to gain insights into the recombination map in C. elegans, and we characterized a large panel of wild strains to characterize linkage disequilibrium. The data on recombination in the lab and in the wild reveal the role of population genomic processes in shaping genotypic diversity in C. elegans, and they lay the groundwork for rapid discovery of the genes underlying phenotypic variation. Results Patterns of Recombination in Recombinant Inbred Advanced Intercross Lines We genotyped 1454 nuclear SNP markers in 236 recombinant inbred advanced intercross lines (RIAILs). These lines represent the terminal generation of a 20-generation pedigree founded by reciprocal crosses between the laboratory wild type strain N2 (Bristol) and the Hawaiian isolate CB4856. The pedigree includes ten generations of intercrossing (random pair mating with equal contributions of each pair to succeeding generations [62]) followed by 10 generations of selfing. The SNP markers span 98.6% of the physical length of the chromosomes (Table S1). The median spacing is 61,160 bp, and 80% of intervals are shorter than 100 kb. Only 35 marker intervals (2.4%) are greater than 200 kb. The RIAILs contain 3,629 breakpoints in 772 marker intervals; some breakpoints may be identical by descent because of the shared ancestry during the intercrossing phase of RIAIL construction. An estimate of the mapping resolution of the panel, based on the distances between intervals containing breakpoints, yields a median bin size of 98 kb. Because larger bins contain more of the genome than smaller bins, the expected size of a bin in which a uniformly distributed QTL will fall is 225 kb. The RIAILs exhibit a genetic map length of 1588 cM, a 5.3-fold expansion of the 300 cM F2 genetic map. The realized expansion is 93% of the expected 5.7-fold map expansion, a difference attributable, at least in part, to the action of selection during the construction of the lines. Although selection and drift may alter the relationship between recombination fraction and meiotic recombination rate [63],[64], the observed recombination fractions are qualitatively informative about global patterns of recombination rate variation across C. elegans chromosomes. The genetic maps for the six C. elegans chromosomes are similar to one another and exhibit five distinct domains: two tips with effectively zero recombination, two high recombination arms, and a low recombination center, consistent with the pattern observed in classical two- and three-point mapping crosses [65]. These domains are evident in Marey maps [66], which show genetic position as a function of physical position (Figure 1; Table 1). As the recombination rate within each domain is relatively constant, we used a segmented linear regression to identify the boundaries between the domains. 10.1371/journal.pgen.1000419.g001 Figure 1 Recombination rate domains. Marey maps for each chromosome show genetic position of each marker (black points) as a function of physical position. Genetic position is measured in centiMorgans as defined on the recombinant inbred advanced intercross line population; these are not meiotic distances. Gray lines show the fits of segmented linear regressions, which estimate the boundaries of the recombination domains and their relative recombination rates. The shaded boxes above each plot show the genetically defined positions of the pairing centers [69]. 10.1371/journal.pgen.1000419.t001 Table 1 Chromosomal Domains. Chr left tip left arm center right arm right tip I Size (kb) 527 3331 7182 3835 197 Size (%) 3.5 22.1 47.7 25.4 1.3 Right end (kb) 527 3,858 11,040 14,875 15,072 Ratea (cM/Mb) 0 3.43 1.34 6.78 0 II Size (kb) 306 4573 7141 2589 670 Size (%) 2.0 29.9 46.7 16.9 4.4 Right end (kb) 306 4,879 12,020 14,609 15,279 Ratea (cM/Mb) 0 4.92 1.33 8.47 0 III Size (kb) 494 3228 6618 2877 567 Size (%) 3.6 23.4 48.0 20.9 4.1 Right end (kb) 494 3,722 10,340 13,217 13,784 Ratea (cM/Mb) 0 7.83 1.17 7.24 0 IV Size (kb) 720 3176 9074 3742 782 Size (%) 4.1 18.2 51.9 21.4 4.5 Right end (kb) 720 3,896 12,970 16,712 17,494 Ratea (cM/Mb) 0 7.65 1.05 3.64 0 V Size (kb) 643 5254 10653 3787 583 Size (%) 3.1 25.1 50.9 18.1 2.8 Right end (kb) 643 5,897 16,550 20,337 20,920 Ratea (cM/Mb) 0 3.22 1.32 5.47 0 X Size (kb) 572 5565 6343 3937 1302 Size (%) 3.2 31.4 35.8 22.2 7.3 Right end (kb) 572 6,137 12,480 16,417 17,719 Ratea (cM/Mb) 0 3.81 1.70 5.14 0 ALL Size (kb) 3262 25127 47011 20767 4101 a Rates are derived from the slopes of the segmented linear fits, scaled to yield a total genetic length of 50 cM for each chromosome. The central domain of each autosome occupies roughly half the chromosome's length, despite the very different lengths of the chromosomes (Table 1). For example, the center of chromosome V is 10.7 Mb, 51% of the chromosome length, while the center of chromosome III is 6.6 Mb, 48% of that chromosome's length. Because all the centers have very similar rates of recombination per base pair (Table 1), their different physical lengths mean that the amount of recombination in each center (its genetic length) varies with total chromosome length. The constraint of one breakpoint per chromosome then requires that the amount of recombination in the arms of each chromosome varies inversely with chromosome length; shorter chromosomes have a larger fraction of their recombination events in their arms, and the physical sizes of the arms explain much of the variation among arms in recombination rates (r2  = 0.51, p = 0.009). Nevertheless, the arms are heterogeneous in relative and absolute length and recombination rate, and the central domains are not perfectly centered on the chromosomes, consistent with the finding of Barnes et al. [65]. Most notably, the left arm of chromosome IV has a relative recombination rate more than twice that of the right arm, though they differ in size by only 15% (Figure 1; Table 1). Inspection of the Marey maps suggests that there may be additional rate variation within the defined domains. To determine whether such variation is expected in the case of constant-rate domains, we simulated chromosomes along the RIAIL pedigree with discrete, constant-rate recombination domains, and we recorded the simulated genotypes at the same marker intervals as our actual genotype data. The simulated chromosomes exhibit patterns of variation within the discrete rate domains qualitatively similar to the observed data, preventing us from placing confidence in the fine-scale patterns in the data (Figure 2A). Nevertheless, the fine-scale variation observed in our data is largely concordant with that present in genetic maps derived from independent two- and three-point mapping crosses with classic visible markers (Figure S1), compiled in WormBase [67]. The general concordance between our map, derived from meioses at 25°C, and the WormBase map, which comes from crosses performed at various temperatures but primarily at 20°C, does not support the notion that the distribution of crossovers is strongly temperature dependent [68]. 10.1371/journal.pgen.1000419.g002 Figure 2 Simulated chromosomes. (A) The Marey maps for actual chromosome III data (black) and 10 chromosome III datasets simulated with discrete, constant-rate recombination domains (colors) show that variation within domains and indistinct boundaries between domains are expected. (B) The observed genetic length of chromosome III is smaller than expected. The histogram shows the lengths of 1000 chromosome III datasets simulated assuming one crossover per meiosis. In our data, each chromosome has one very sharp center-arm boundary and one that is less sharp, and boundaries exhibit the identical pattern in the classical maps. In five of the six chromosomes, the less-sharp boundary is on the side of the chromosome that holds the pairing center [69] (Figure 1). The exception is chromosome III. We find two points of disagreement between our results and previous discussion of recombination maps in C. elegans. First, the X chromosome clearly possesses domain structure similar to that of the autosomes (Figure 1), contrary to inferences from sparser data. The major distinguishing feature of the X-chromosome center is its relative size, 36% of the chromosome length, which is substantially less than the 47–52% on the autosomes. Second, we find that the chromosome tips have extremely low recombination rates; the terminal domain of each chromosome end is a region of effectively zero recombination, a pattern observed previously only for the right tip of the X [65] and more recently for chromosome III [68]. Every chromosome terminus contained a series of nonrecombining markers, and these domains ranged in size from 200 kb (IR) to 1300 kb (XR), averaging 600 kb. Selection We previously showed that the allele frequencies in the RIAILs depart from the neutral expectation, implicating selection during the application of the cross design [40].We extend that analysis here, estimating expected allele frequency skew using our simulations that explicitly incorporate marker spacing and recombination domain structure. Chromosome I (p 0.5; fewer than 5% of the 236 RIAILs had confidence scores 0.35. For the 285 SNPs that yielded some confidence scores between 0.35 and 0.5, fluorescence intensities were individually inspected and calls assigned manually when unambiguous. For many of the 1205 RIAIL-confirmed SNPs, one or more wild isolates failed to give any genotyping signal. We identified a threshold of normalized intensities of both fluors ≤0.009 at which 768 wild isolate genotypes gave no signal (0.5018% of all calls) while the RIAILs gave only 8 genotypes at the same level (0.0028%), a 180-fold enrichment for the wild isolates. As these failed wild isolate genotypes exhibit linkage disequilibrium with well-genotyped SNPs, they likely represent mutations that disrupt the hybridization of the Illumina oligos to the genotyping interval. We assigned a third-allele call to these genotypes. The remaining 331 SNP assays were individually examined to assign genotype calls. For 46 assays, N2 and CB4856 yielded the same genotype, implicating false-positive SNPs predictions. An additional 29 SNPs produced uninterpretable fluorescence intensity scatterplots. We were able to assign genotype calls for 196 SNPs which failed to pass the confidence threshold due primarily to low intensity. The remaining 70 SNPs exhibited more than two clusters of genotypes in plots of fluorescence intensities. We found that the extra clusters were due to hybridization of the SNP-assay oligos to additional loci which themselves exhibited segregation. As a result, each cluster could be assigned a homozygous genotype call on the basis of linkage disequilibrium with adjacent SNPs among the RIAILs. The final dataset included 1460 SNPs. We excluded one RIAIL from subsequent analysis because its genotypes included a large proportion of ambiguous calls. The resulting dataset includes 236 RIAILs and 125 wild isolates scored at 1460 SNPs. The 527,061 genotypes include 1450 third allele (putative deletion) calls among the wild isolates, 654 Ns for bad data, and 180 heterozygote calls. Eight of the RIAILs exhibited short tracts of residual heterozygosity. The mitochondrial genotype for each RIAIL was determined by PCR-RFLP, using primers 5′-ctcggcaatttatcgcttgt and 5′- cttactcccctttgggcaat and digesting with PmeI. We estimated a genetic map for the RIAIL cross using r/qtl [74] and found that 6 SNPs had expected physical positions on chromosomes other than those to which they mapped. These may represent errors in the genome assembly or in oligo production; the oligo sequences map uniquely in the genome assembly. The expected and mapped physical positions of these SNPs are in Table S4. Analyses of RIAILs employed the 1454 physically mapped SNPs; the complete dataset is provided in Table S1. We considered the mismapped SNPs in analyses of WI haplotypes but excluded them from analyses that required physical positions. The complete wild isolate dataset is provided in Table S2. In all cases where a RIAIL genotype contained an allele from one strain flanked by alleles from the other parental strain (i.e., a single-marker segment), we re-examined the plots of fluorescence intensities to confirm the genotype call; such a pattern is expected for a genotyping error and can strongly bias estimates of map lengths and breakpoint counts [71]. We estimate bin size as the distance from the end of a chromosome to the midpoint of the first breakpoint-containing interval or as the distance between the midpoints of successive breakpoint-containing intervals. This approach ignores bins created by multiple independent breakpoints within a single interval and uses interval midpoints rather than outside markers to avoid overlapping bins. Expected bin size is the per-base-pair sum of the squares of the bin lengths [106]. Recombination Rate Domain Analysis We estimated genetic distances in r/qtl using the Haldane map function, treating observed recombination fractions as though they had been observed in a backcross. The marker density is sufficiently high that the exact form of map function employed has little effect on estimated genetic distances. We defined the tip domains of each chromosome to include all markers between the chromosome ends and the first recombination breakpoint observed in the RIAILs. The midpoint of this most distal recombinant interval was chosen as the tip-arm domain boundary. The non-tip markers were included in a segmented linear regression analysis, using the segmented package in R [107], to identify arm-center domain boundaries. To estimate confidence intervals for the domain boundaries, we used simulations of the RIAIL chromosomes. We simulated 1000 RIAIL populations for each chromosome, using the known pedigree. Each gamete received a meiotic chromosome with 0 or 1 breakpoints (i.e., complete interference [4]), the position of the breakpoints determined by the relative recombination fractions of the centers and arms estimated from the RIAILs. The tips were specified to be non-recombining and the two arms of each chromosome were assigned equal recombination probabilities per base pair; that is, intra-chromosomal differences in rate between arms were not modeled. Each chromosome was simulated as a sequence of markers with one marker for every kilobase of chromosome. We then sampled markers at spacing defined by the genotyped SNPs, yielding a dataset of RIAIL chromosomes simulated with discrete, constant-rate recombination domains. We estimated domain boundaries for the simulated chromosomes by segmented linear regression. The 95% confidence intervals vary in size depending on the size of the chromosome and the difference in recombination probability between adjacent domains. On average the intervals span 1.1 Mb. The simulated RIAIL chromosomes were also used to estimate expected allele frequency skews and expected genetic lengths for each of the chromosomes. The RIAIL allele frequencies at each marker were estimated using the sim.geno function in r/qtl [74] to infer missing data. WormBase [67] genetic maps are derived from data available on June 7, 2008, for 4542 genes with experimentally determined map positions and known physical positions. As our analyses of these data are qualitative, we made no effort to screen these data for quality, as evident from several obviously mismapped data points in Figure S1. Breakpoint Count QTL Analysis We performed non-parametric interval mapping [76] in r/qtl [74]. The RIAILs differ in their relatedness as a result of the derivation of two selfing lines from each 10th generation intercross hermaphrodite. The paired lines exhibit substantially higher similarity (mean percent bases shared ±standard deviation, 69.6±11.4%) than unpaired lines (52.8±9.5%), so that background similarity could inflate lod scores at markers unlinked to QTLs. Moreover, the significance of the lod scores would be overestimated by conventional permutation, because the RIAILs are not exchangeable; permuted datasets would break the associations between genetically and phenotypically similar RIAILs [75],[108]. Note that the mean similarity among unpaired lines is greater than the expected 50% because of the influence of selection on allele frequencies during RIAIL construction. For this reason we have not used simulated genotypes [108] to assess QTL significance. Instead we used a structured analysis and structured permutations. We split the dataset into two subsets with each RIAIL pair split between the two. We performed linkage scans separately for the two subsets and summed the lod scores. We permuted the two subsets separately 1000 times to derive genome-wide significance estimates for each phenotype. Structure Analysis Estimation of population structure used a dataset of 40 haplotypes (haplotype 21, which differs from haplotype 20 only by a single putative deletion allele, was excluded, as the analysis treats these genotypes as missing data) and 1454 SNPs. We ran structure 2.2 [80] ten times at each of five values of K, the number of ancestral populations. We used the linkage model [79] with a burn-in period of 10,000 replicates followed by 50,000 replicates to collect estimated parameters and likelihoods. The outputs of the repeated runs at each K were aligned using CLUMPP 1.1.1 [109] and Figure 8 generated using distruct 1.1 [110]. Linkage Disequilibrium We computed lower bounds on Rmin for each chromosome using HapBound and upper bounds using SHRUB [81]. We used a dataset with 1318 SNPs, after excluding all sites with missing data or putative deletion alleles. We used Haploview 4.0 [111] to calculate r2 between all pairs of the 1042 sites with minor allele frequencies greater than 0.1 in the 40-haplotype dataset. We used these r2 values to estimate ρ per basepair and its standard error by nonlinear regression using equation 3 of Weir and Hill [112], implemented with the R function nls. This simple method of moments estimator roughly approximates a likelihood estimator. Estimates of the half-length of LD represent the distance at which the expected value of r2 from the nonlinear regression drops below half its initial value. To estimate ρ in sliding windows, we used the r2 values among SNPs within 1 Mb to either side of each focal SNP. These 2 Mb windows are the smallest practicable windows given our marker density. We also estimated ρ for whole arms and centers, using the domain boundaries estimated from the RIAILs and shown in Table 1. We estimated the distribution of r2 among nonsyntenic sites in the absence of association from 100 permutations of chromosomes among the 40 wild isolate haplotypes, preserving allele frequencies and chromosomal haplotype frequencies but breaking correlations among chromosomes. The means of the ranked nonsyntenic r2 values across permutations provides an estimate of the number of false discoveries at each quantile of the r2 distribution. Permutations and calculations were performed in R, and r2 was calculated using the LDmat function in the popgen library (http://www.stats.ox.ac.uk/˜marchini/software.html). The dataset included 784 sites with no missing data and minor allele frequencies greater than 0.1. Association Mapping We excluded singleton SNPs and those with missing data and used the resulting 40×907 matrix to estimate an identity-by-state kinship matrix using EMMA [88]. We did not remove SNPs in perfect linkage disequilibrium with other SNPs because we sought to discern the genomic extent of intervals associated with traits. We estimated the significance of associations in the mixed-model analysis using likelihood ratio tests with the function emma.ML.LRT, incorporating the kinship matrix and in some cases the ancestral population admixture assignments from structure (K = 3) as fixed effects. Supporting Information Figure S1 RIAIL maps recapitulate classical marker mapping results. Chromosomal and regional rate variation patterns observed in the recombinant inbred advanced intercross lines (black points) are similar to those observed from thousands of two- and three-point mapping experiments reported in WormBase (red points). The RIAIL map distances represented here are scaled to yield 50 cM total lengths for each chromosome. The classical mapping data corroborate the great difference in rate between the left and right arms of chromosome IV, with an exceptionally high rate on IVL between roughly 1.0 and 2.4 Mb. At a sub-arm scale, we see corroboration for variation along IIL very clearly and IR, VR, and XL less so. Other regions that show variation in the WormBase map are not evident in the RIAIL map, notably IVR, VL, and XR. Nevertheless, our results support the claim of Barnes et al. [65] that the arms are not truly constant-rate regions. (3.99 MB EPS) Click here for additional data file. Figure S2 Pairwise identity among wild isolate haplotypes. For each chromosome, pairwise allele-sharing between each haplotype is plotted below the diagonal. Above the diagonal we present results of the same analysis excluding all singleton SNPs, all of which are unique to CB4856 (haplotype 41). (1.65 MB EPS) Click here for additional data file. Figure S3 Linkage disequilibrium within chromosomes. Pairwise r 2 values for all sites with minor allele frequencies >0.1 are plotted. The axes represent physical position along each chromosome. Pairs of sites with r 2>0.5 are in black and those with r 2>0.9 are red. (0.06 MB PDF) Click here for additional data file. Figure S4 Decay of linkage disequilibrium. Each point plots r 2 for a pair of sites with minor allele frequencies >0.1, colored by chromosome, as a function of the physical distance between the two sites. The curves plot the nonlinear regression of r 2 on distance using the sample-size-corrected relationship between the variables from Weir and Hill [112]. (0.19 MB PDF) Click here for additional data file. Figure S5 Distributions of p-values for tests of association. The calculated p-value for each SNP marker is plotted under three tests of association as in Figure 10: Fisher's exact test, mixed-model likelihood ratio tests incorporating a genotypic similarity (IBS) matrix, and mixed-model LRT incorporating both genotypic similarity and the results of structure analysis. The straight line represents the expectation for uniformly distributed p-values. Without mixed-model control for genomic similarity, the p-value distribution is profoundly skewed to low values. (4.04 MB PDF) Click here for additional data file. Table S1 SNPs and RIAIL Genotypes. SNP details and genotype data for 236 recombinant inbred advanced intercross lines. (0.94 MB TXT) Click here for additional data file. Table S2 SNPs and Wild Isolate Genotypes. SNP details and genotype data for 125 wild isolates. (0.62 MB TXT) Click here for additional data file. Table S3 Strains and their Haplotypes. Strain, haplotype number, locality, and counts of genotype calls. (0.03 MB XLS) Click here for additional data file. Table S4 Misplaced SNP markers. Illumina oligo sequences, expected positions, and map-based positions. (0.02 MB XLS) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Loci of repeated evolution: a catalog of genetic hotspots of phenotypic variation.

              What is the nature of the genetic changes underlying phenotypic evolution? We have catalogued 1008 alleles described in the literature that cause phenotypic differences among animals, plants, and yeasts. Surprisingly, evolution of similar traits in distinct lineages often involves mutations in the same gene ("gene reuse"). This compilation yields three important qualitative implications about repeated evolution. First, the apparent evolution of similar traits by gene reuse can be traced back to two alternatives, either several independent causative mutations or a single original mutational event followed by sorting processes. Second, hotspots of evolution-defined as the repeated occurrence of de novo mutations at orthologous loci and causing similar phenotypic variation-are omnipresent in the literature with more than 100 examples covering various levels of analysis, including numerous gain-of-function events. Finally, several alleles of large effect have been shown to result from the aggregation of multiple small-effect mutations at the same hotspot locus, thus reconciling micromutationist theories of adaptation with the empirical observation of large-effect variants. Although data heterogeneity and experimental biases prevented us from extracting quantitative trends, our synthesis highlights the existence of genetic paths of least resistance leading to viable evolutionary change. © 2013 The Author(s). Evolution © 2013 The Society for the Study of Evolution.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, CA USA )
                1553-7390
                1553-7404
                June 2015
                26 June 2015
                : 11
                : 6
                : e1005323
                Affiliations
                [1 ]Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
                [2 ]Department of Ecology and Evolutionary Biology and Centre for the Analysis of Genome Evolution and Function, University of Toronto, Ontario, Canada
                University of Edinburgh, UNITED KINGDOM
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: PCP JLF JHW. Performed the experiments: JHW RMR TEA. Analyzed the data: JLF CGT WW ADC. Wrote the paper: JLF PCP JHW ADC.

                [¤a]

                Current Address: Department of Biological Sciences, The University of Alabama, Tuscaloosa, Alabama, USA

                [¤b]

                Current Address: Department of Biology, William Jewell College, Liberty, Missouri USA

                Article
                PGENETICS-D-14-02058
                10.1371/journal.pgen.1005323
                4482642
                26114425
                312a2200-f45c-4c94-8d60-45efaacfdb46
                Copyright @ 2015

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

                History
                : 29 July 2014
                : 31 May 2015
                Page count
                Figures: 5, Tables: 2, Pages: 25
                Funding
                Funding was provided by the National Institutes of Health (R01-GM096008), the National Science Foundation (DBI-1003124, DEB-1120417), and the Ellison Medical Foundation ( http://www.ellisonfoundation.org). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Custom metadata
                Assembly and annotation are at www.wormbase.org and are deposited in GenBank under BioProject ID PRJNA248909. De novo annotations of existing genomes are published at figshare.com at dx.doi.org/10.6084/m9.figshare.1399184, dx.doi.org/10.6084/m9.figshare.1396472, dx.doi.org/10.6084/m9.figshare.1396473, dx.doi.org/10.6084/m9.figshare.1396474, dx.doi.org/10.6084/m9.figshare.1396475, dx.doi.org/10.6084/m9.figshare.1396476, dx.doi.org/10.6084/m9.figshare.1396477, and dx.doi.org/10.6084/m9.figshare.1396478. Our analysis pipeline is available at github.com/Cutterlab/popgenome_pipeline. Further information regarding the unpublished genomes used in this manuscript can be found in the Acknowledgments.

                Genetics
                Genetics

                Comments

                Comment on this article