57
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Next-generation sequencing analysis of lager brewing yeast strains reveals the evolutionary history of interspecies hybridization

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The lager beer yeast Saccharomyces pastorianus is considered an allopolyploid hybrid species between S. cerevisiae and S. eubayanus. Many S. pastorianus strains have been isolated and classified into two groups according to geographical origin, but this classification remains controversial. Hybridization analyses and partial PCR-based sequence data have indicated a separate origin of these two groups, whereas a recent intertranslocation analysis suggested a single origin. To clarify the evolutionary history of this species, we analysed 10 S. pastorianus strains and the S. eubayanus type strain as a likely parent by Illumina next-generation sequencing. In addition to assembling the genomes of five of the strains, we obtained information on interchromosomal translocation, ploidy, and single-nucleotide variants (SNVs). Collectively, these results indicated that the two groups of strains share S. cerevisiae haploid chromosomes. We therefore conclude that both groups of S. pastorianus strains share at least one interspecific hybridization event and originated from a common parental species and that differences in ploidy and SNVs between the groups can be explained by chromosomal deletion or loss of heterozygosity.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Microbe domestication and the identification of the wild genetic stock of lager-brewing yeast.

          Domestication of plants and animals promoted humanity's transition from nomadic to sedentary lifestyles, demographic expansion, and the emergence of civilizations. In contrast to the well-documented successes of crop and livestock breeding, processes of microbe domestication remain obscure, despite the importance of microbes to the production of food, beverages, and biofuels. Lager-beer, first brewed in the 15th century, employs an allotetraploid hybrid yeast, Saccharomyces pastorianus (syn. Saccharomyces carlsbergensis), a domesticated species created by the fusion of a Saccharomyces cerevisiae ale-yeast with an unknown cryotolerant Saccharomyces species. We report the isolation of that species and designate it Saccharomyces eubayanus sp. nov. because of its resemblance to Saccharomyces bayanus (a complex hybrid of S. eubayanus, Saccharomyces uvarum, and S. cerevisiae found only in the brewing environment). Individuals from populations of S. eubayanus and its sister species, S. uvarum, exist in apparent sympatry in Nothofagus (Southern beech) forests in Patagonia, but are isolated genetically through intrinsic postzygotic barriers, and ecologically through host-preference. The draft genome sequence of S. eubayanus is 99.5% identical to the non-S. cerevisiae portion of the S. pastorianus genome sequence and suggests specific changes in sugar and sulfite metabolism that were crucial for domestication in the lager-brewing environment. This study shows that combining microbial ecology with comparative genomics facilitates the discovery and preservation of wild genetic stocks of domesticated microbes to trace their history, identify genetic changes, and suggest paths to further industrial improvement.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Whole-Genome Comparison Reveals Novel Genetic Elements That Characterize the Genome of Industrial Strains of Saccharomyces cerevisiae

            Introduction During its long history of association with human activity, the genomic makeup of the yeast S. cerevisiae is thought to have been shaped through the action of multiple independent rounds of wild yeast domestication combined with thousands of generations of artificial selection. As the evolutionary constraints that were applied to the S. cerevisiae genome during these domestication events were ultimately dependent on the desired function of the yeast (e.g baking, brewing, wine or bioethanol production), these multitude of selective schemes have produced large numbers of S. cerevisiae strains, with highly specialized phenotypes that suit specific applications [1], [2]. As a result, the study of industrial strains of S. cerevisiae provides an excellent model of how reproductive isolation and divergent selective pressures can shape the genomic content of a species. Despite their diverse roles, industrial yeast strains all share the general ability to grow and function under the concerted influences of a multitude of environmental stressors, which include low pH, poor nutrient availability, high ethanol concentrations and fluctuating temperatures. In comparison, non-industrial isolates such as laboratory strains, have been selected for rapid and consistent growth in nutrient rich laboratory media, thereby producing markedly different phenotypic outcomes when compared to their industrial relatives [3]. The outcomes of these very different selection pressures are therefore most evident when comparing industrial and non-industrial yeasts. As an example, laboratory strains of S. cerevisiae, such as S288c, are unable to grow in the low pH and high osmolarity of most grape juices and therefore cannot be used to make wine. This is a clear difference between industrial and non-industrial strains of S. cerevisiae, however there are numerous subtle differences not only between industrial strains, but also between strains used within the same industry [4], [5], highlighting the overall genetic diversity found in this species. There have been several attempts to characterize the genomes of industrial strains of S. cerevisiae which have uncovered differences that included single nucleotide polymorphisms (SNPs), strain-specific ORFs and localized variations in genomic copy number [6]–[14]. However, the type and scope of genomic variation documented by these studies were limited either by technology constraints (e.g arrayCGH relying on the laboratory strain as a “reference” genome), or by the resources required for the production of high-quality genomic assemblies which has limited the scope and number of whole-genome sequences available for comparison. In addition, to limit genomic complexity to a manageable level, previously published whole-genome sequencing studies on industrial strains used haploid representations of diploid, and often heterozygous, commercial and environmental strains [9]–[13]. We sought to address these shortcomings by sequencing the genomes of four wine and two brewing strains of S. cerevisiae in their industrially-used forms. The industries of winemaking and brewing were targeted for this work as they have the longest association with S. cerevisiae (measured in the thousands of years) and each industry has accumulated large numbers of phenotypically distinct strains for which genetic comparisons can be made. This study demonstrates that industrial yeasts display significant genotypic heterogeneity both between strains, but also between alleles present within strains (i.e. heterozygosity). This variation was manifest as SNPs, small insertions and deletions, and as novel, strain and allele-specific ORFs, many of which had not been found previously in the S. cerevisiae genome and may provide the basis for novel phenotypic characteristics. Interestingly, several ORFs were shown to comprise a gene cluster that was present in multiple copies and at a variety of genomic loci in a subset of the strains examined. Furthermore, this cluster appears to have integrated into genomic locations by a novel circular intermediate, but without employing classical transposition or homologous recombination, which we believe represents the first time such an element has been characterized in S. cerevisiae. Overall, this work suggests that, despite the scrutiny that has been directed at the yeast genome, there remains a significant reservoir of ORFs and novel modes of genetic transmission which may have significant phenotypic impact in this important model and industrial species. Results Six industrial yeasts were chosen for genomic analysis, comprising four commercial wine strains and two brewing strains used for the production of ales (ale strains are primarily S. cerevisiae, while lager-style brewing strains are S. pastorianus, a hybrid of S. cerevisiae and S. bayanus [15], [16]). These six strains were sequenced to an average coverage of 20 fold with a combination of shotgun and paired-end methods using the GS FLX Titanium series chemistry [17], which resulted in six high quality genomic assemblies (Table 1). 10.1371/journal.pgen.1001287.t001 Table 1 Strains sequenced in this study. Strain Industry Supplier Contigsa N50a(kb) Scaffoldsa Assembly sizea Genbank Accessionb Lalvin QA23 wine Lallemand Inc. 96 185 39 11.6 Mb ADVV00000000 AWRI796 wine Maurivin 49 409 31 11.6 Mb ADVS00000000 Vin13 wine Anchor Bio-Technologies 80 308 29 11.5 Mb ADXC00000000 FostersO brewing (ale) Fosters Group Ltd. 95 219 35 11.4 Mb AEEZ00000000 FostersB brewing (ale) Fosters Group Ltd. 78 209 25 11.5 Mb AEHH00000000 VL3 wine Laffort 70 316 29 11.4 Mb AEJS00000000 a Excluding repetitive sequencing contigs such as sub-telomeric regions and Ty elements. b These Whole Genome Shotgun projects have been deposited at DDBJ/EMBL/GenBank. The versions described in this paper are the first versions; ADVV01000000, ADVS01000000, ADXC01000000, AEEZ01000000, AEHH01000000 and AEJS01000000. Large chromosomal variations in industrial yeast strains Rather than being strictly diploid, many industrial yeast strains display chromosomal copy number variation (CNV) [18]. In order to catalogue CNV in the industrial yeast genomes, the depth of sequencing coverage determined for each sequence contig were calculated such that areas of CNV could be detected as localized variations in that coverage (Figure 1). There were several large areas of increased copy number across the strains including six potential whole-chromosome amplifications (chrI of AWRI796, chrVIII of VL3, chrIII of FostersO and chrIII, V and XV of FostersB) and one potential reduction in chromosomal copy number (chrXIV of FostersO). There were also several partial chromosomal CNVs, including amplification of 200 kb of chrXIV in AWRI796, 600 kb of chrII and 200 kb of chrX in FostersO and a 400 kb reduction from chrVII of FostersO (Figure 1). However, while the ale strains had a higher number of large CNVs than wine strains, the overall fold change of these CNVs was generally reduced. This reduction can be most easily explained by the brewing strains having a polyploid genetic base while the wine strains are diploid, an observation which has been seen previously in these industrial yeasts [18]. 10.1371/journal.pgen.1001287.g001 Figure 1 Chromosomal aneuploidy determined by whole-genome sequencing coverage. Sequencing coverage was determined for each contig using a sliding window of 1001 bp, with a 100 bp step frequency and plotted in chromosomal order (black circles). Regions of copy number variation were scored as either being greater than 1.25-fold (yellow lines; approximating either three or five copies in a tetraploid genome) or 1.5-fold (red lines; one or three copies in a diploid genome) different to the median coverage for that strain. Strains are shaded according to their industry (wine, red; ale, blue). Heterozygosity in industrial strains As existing published industrial yeast genome sequences were either generated from haploid derivatives of industrial strains [9]–[12] or had heterozygous regions discarded during analysis [13], the level of genome-wide heterozygosity present in industrial strains remains largely unknown. However, as the assemblies performed in this study retained genomic heterozygosity, it was possible to determine the level of allelic differences within each of these strains (Table 2). While every industrial strain contained heterozygous single nucleotide polymorphisms (SNPs), the proportion of these varied over thirty-fold between wine strain AWRI796 (1041 total heterozygous bp) and the brewing strain FostersB (33071 bp). Heterozygous insertions and deletions (InDels) were also present and ranged from single base pair variants to large InDels of up to 35.3 kb. Strains were also shown to contain heterozygous instances of Ty element insertion, although, due to the repetitive nature of these elements, their presence in the genome could generally only be estimated through paired-end information (data not shown). 10.1371/journal.pgen.1001287.t002 Table 2 Heterozygosity in industrial S. cerevisiae strains. Strain Origin Ploidy HomozygousSNPsa HeterozygousSNPsa S288C Lab 1n 41708 0 YJM789 Human isolate 1n 40675 0 JAY291 Bioethanol 1n 25648 0 RM11-1a Vineyard isolate 1n 10825 0 EC1118 Wine 1nb 13241 0 AWRI1631 Wine 1n 9935 0 QA23 Wine 2n 4913 18861 AWRI796 Wine 2n 8996 1041 Vin13 Wine 2n 3544 15216 VL3 Wine 2n 5108 9904 FostersO Ale >2nc 25802 27215 FostersB Ale >2nc 23125 33071 a SNPs were calculated relative to the most common base across all twelve strains at each position. b EC1118 is a diploid commercial strain but the available sequence is a haploid representation of this genome. c As estimated from overall sequencing coverage. Nucleotide variation present in S. cerevisiae In addition to the intra-strain variation that was present between homologous chromosomes within individual strains, there was also significant nucleotide variation between strains. As seen for the allelic variation, both SNPs and InDels were found between strains, with inter-strain InDels of up to 45 kb being observed. Many of the smaller InDels (both heterozygous and homozygous) were located in regions comprising tandem repeats (Figure 2A, Table S1) and primarily in the expansion and contraction of di- and tri-nucleotide tandem repeats (Figure 2B). Indeed, when using chromosome XVI as an example, over 86% of the instances of di- and tri-nucleotide repeats displayed variable length in at least one of the strains. As the size of tandem repeats has been associated with differences in gene expression [19], this suggests that there are both strain and allele-specific differences in the expression of genes proximal to these repeat-associated InDel events. 10.1371/journal.pgen.1001287.g002 Figure 2 Nucleotide variation in S. cerevisiae. (A) InDels associated with tandem repeats. Histogram showing the proportion of tandem repeats of various sizes (repeated size indicated on x-axis) present on chrXVI that were either conserved in repeat length (blue) or contained strain-specific InDels (yellow). The total number of repeat loci present in each class is listed above the histogram. (B) An example of a strain- and allele-specific InDel in a tandem repeat in the promoter region of YPL088W. SNP variation was also common throughout the strains with a total of 165,913 non-degenerate SNPs (unique points of nucleotide variation) that were present in at least one allele of the twelve strains investigated (∼1.3% of the total genome length). However, given the influence of large, strain-specific InDels (which were filtered out of the SNP analysis) the apparent SNP density is much higher than 1.3%, such that these SNPs were shown to display a median inter-SNP distance of only 37 bp. By using the number of SNPs separating any two isolates as an estimation of their relatedness (Figure 3A), we were able to show that industrial yeasts are distinct from both the laboratory and human pathogenic strains and were also found to group by industry. This was especially true of the brewing strains which displayed a high degree of genetic distance not only from the laboratory and human isolates, but also from the wine and bioethanol strains. The only exception to this pattern of grouping by industry or environment niche was with the ‘natural’ isolate RM11-1a which grouped closely with wine strains. However, given that it is descended from a strain sourced from a vineyard, RM11-1a may well share genetic origins with those strains used in winemaking. 10.1371/journal.pgen.1001287.g003 Figure 3 Nucleotide relationships between S. cerevisiae strains. (A) A neighbor joining tree representing the genetic distance between strains as calculated from the total SNP diversity present in whole genome alignments. (B) A neighbor joining tree representing the genetic distance between strains presented in part (A) and representative strains from several S. cerevisiae geographical populations [12]. Industrial strains are color-coded based upon their primary industry (wine/European, including RM11-1a, pink; ale, blue; bioethanol, green; sake, yellow). Strains that are predicted to contain the heterogeneous five-gene cluster are labeled in bold. In order to put the genetic variation observed in these genomic alignments in a larger population context, twelve strains were selected to represent each of the six main S. cerevisiae population groups as proposed by Liti et al [12] for further SNP comparison (Figure 3B). In this broader context, wine strains sequenced in this study were shown to also group tightly with the wine/European strains DBVPG1106 and DBVPG1373, showing that the data produced across these two studies are directly comparable. However, while the ale strains were still shown to be distinct from the wine isolates they were found to be far closer to the wine strains than isolates such as those used in sake production, which display the greatest level of nucleotide diversity when compared to the wine strains. Indeed, when the SNP data from these additional strains in included in the calculations of SNP density, the total number of non-degenerate SNPs increases to 216,207 (∼1.7%) with a median inter-SNP distance of only 27 bp. However, despite comparisons to eighteen other diverse strains of S. cerevisiae 15,576 of these SNPs were found solely in this study (2,501 in more than one strain) and with the vast majority of these SNPs being present in a heterozygous form (only 1,864 novel SNPs were homozygous in at least one strain). ORF conservation across S. cerevisiae To determine how inter-specific variation at the nucleotide level translated into protein-coding differences, the predicted coding potential of each strain was compared. ORFs were predicted from each sequence (including the pre-existing whole genome sequences) using Glimmer [20] and compared using a combination of BLAST [21] homology matches and genomic synteny to differentiate instances of orthology from gene duplication (Table S2). When using the laboratory strain S288c as a reference, there was an average of 92% ORF coverage across the strains. The majority of S288c ORFs without a match in other strains were shown to be located in repetitive regions of the S. cerevisiae genome such as in the sub-telomeric zones or the numerous Ty retrotransposons that are present in S288c genome relative to other strains. Due to the repetitive nature of these regions it was often impossible to unambiguously position these sequences in the industrial yeast genome assemblies and they remain within repetitive, unmappable contigs in the various genome assemblies. It therefore appears that, due to its persistent propagation in the laboratory, the genome of S288c may represent a reduced genomic state as it does not appear to contain additional genes that provide unique metabolic or cellular potential outside of those present in other strains. It does however contain a far greater number of Ty transposons relative to all of the other strains suggesting that transposon proliferation occurred on at least one occasion during the development of this laboratory strain. Novel ORFs While the laboratory strain S288c is considered the reference for the genomic complement of S. cerevisiae, it is becoming apparent that it lacks a multitude of ORFs which exist in other strains of S. cerevisiae [9]–[13], [22], [23]. This is confirmed n the present study with between 36 (FostersB) and 110 (Lalvin QA23) ORFs lacking significant homology to the S288c genome but for which there were clear matches to sequences in other S. cerevisiae strains or microbial species (Table S2). Orthologs of 102 out of 218 of the non-degenerate set of these ‘non-S288c’ ORFs have been identified previously in S. cerevisiae strains, mainly through whole-genome sequencing of AWRI1631, EC1118 and RM11-1a and YJM789 [8], [9], [13] (Table S2). These include genes encoding proteins such as the Khr1 killer toxin [24] which is found in YJM789, EC1118, Vin13, VL3, FostersB and FostersO and orthologs of the MPR1 stress-resistance gene (which was originally identified in the Sigma 1278b strain[23]) in RM11-1a, EC1118, AWRI1631, JAY291, QA23 and VL3. Interestingly, in addition to these ORFs there were at least three proteins present in the human pathogen YJM789 and the FostersB and FostersO ale strains but which were lacking from the wine, biofuel and laboratory strains (Figure 4C). These included the YJM-GNAT GCN5-related N-acetyltransferase [8] and a separate gene cluster which is predicted to contain both RTM1, which was identified previously as a distillery-strain specific gene that provides resistance to an inhibitory substance found in molasses [22], and a large ORF of around 2.3 kb which, despite its large size and high-degree of conservation across the brewing and human pathogenic strains, lacks significant homology to any other protein sequences except for six isolates from the large S. cerevisiae population genomic screen which also appear to encode this protein [12] (Figure S1). In addition to these two conserved ORFs, in the ale strains this cluster also appears to encode an invertase that would be expected convert sucrose into the sugars glucose and fructose. 10.1371/journal.pgen.1001287.g004 Figure 4 Novel genes found in industrial strains. (A) A 45 kb strain-specific region in AWRI796 which is predicted to encode at least 21 ORFs (full ORF sequences are listed in Dataset S12). ORFs with homology to AADs are highlighted in yellow. The extreme 5′ and 3′ ends of this cluster are homologous to a repetitive region present in the sub telomeric regions of chrXIII, XV and XVI (dark blue boxes). Black dots within ORFs represent potential frameshifts in the sequence of these regions. (B) Clustalw dendrogram produced by aligning AAD proteins from S288c, AWRI796 and the top five matches to the highly divergent AWRI796 proteins AAD(i) and AAD(ii). (C) The region in the brewing strains FostersO and FostersB containing RTM1 [22] and the conserved hypothetical ORFs are also found in the human pathogen YJM789 [8]. Despite the presence of at least two existing high-coverage wine strain sequences and at least an additional six low coverage genomes, the entire repertoire of ORFs present in wine strains of S. cerevisiae, let alone the species as a whole, is far from complete. In addition to expanding the strain range of previously identified non-S228c proteins, it was possible to identify at least eleven ORFs that lacked homology to existing proteins from S. cerevisiae, in addition to many new paralogs of existing S. cerevisiae genes. These novel ORFs often clustered in large InDels, the largest of which was a 45 kb fragment in the wine strain AWRI796. This novel genomic region is located adjacent to a large repetitive element present on chromosomes XIII, XV and XVI, which hampered initial efforts to assign this region to a specific chromosome. However, through the application of a 20 kb paired-end library, it was possible to bridge the repetitive region and position this novel region at the end of the right arm of chromosome XV. This fragment is predicted to encode nineteen ORFs (Figure 4A), three of which are predicted to encode aryl-alcohol dehydrogenases (AADs). AADs have been extensively characterized in filamentous fungi where they catalyze the reversible reduction of aldehydes and ketones to aromatic alcohols during lignin-degradation [25], [26]. These new AAD homologs are phylogenetically distinct from other AAD enzymes that have been identified, including the seven predicted AADs that are present in the S288c genome [27], [28] (Figure 4B). Characterization of a novel, and potentially transmissible, gene cluster One particularly curious feature of many of the industrial yeast strains analyzed in this study, was a cluster of five conserved ORFs that was present in all of the wine strains, RM11-1a and the bioethanol strain JAY291, and potentially in at least four of the strains present in the Liti et al [12] study (Figure 3). This cluster is predicted to encode two potential transcription factors (one zinc-cluster, one C6 type), a cell surface flocullin, a nicotinic acid permease and a 5-oxo-L-prolinase, and has been suggested to be horizontally acquired by S. cerevisiae from Zygosacharomyces spp [13]. In this study we have been able to show that while the sequences of the individual genes within this cluster are highly conserved between strains, the cluster itself is actually highly diverse with respect to copy number, genomic location and overall gene order (Figure 5, Table S3). The cluster was present in one to at least three copies across strains, with individual clusters being located in at least seven different genomic loci (Figure 5A). For example, wine strain Lalvin QA23 was shown to contain at least three copies of the cluster, found in three different genomic loci and with at least two copies being heterozygous. However, despite this diversity, the sequence of the ORFs and intergenic regions of the cluster were highly conserved, with only fifteen nucleotide substitutions (0.01%) recorded across the eleven known copies of the cluster (Figure 5B, Figure S2). 10.1371/journal.pgen.1001287.g005 Figure 5 A divergent cluster of genes with a possible circular intermediate. (A) The location and orientation of the gene cluster throughout the genomes of the industrial yeasts. Upper case roman numerals refer to standard S. cerevisiae chromosomes (unk – location unknown) with individual loci labeled with lower case roman numerals. (B) Nucleotide conservation of the five-gene clusters. An alignment of the nucleotide sequence of all eleven clusters is shown below a schematic depiction of the five predicted ORFs present in this nucleotide sequence (A, zinc-cluster transcription factor; B, cell-surface flocculin; C, nicotinic acid permease; D, 5-oxo-L-prolinase; E, C6 transcription factor). In order to produce contiguous alignments, the sequence of each cluster was manually split to begin with the start codon of ORF A, with the position of each break indicated. Conserved bases are shaded blue (light blue for ORFs sequences). Insertions are highlighted in red and substitutions in green. (C) Differences in gene order within individual clusters. Each of the five genes are represented by filled circles (labeled as in partB), with the systematic name of the ORFs that border each insertion listed in open squares (Z.b, this cluster is present in Z. bailii (Accession number FN295481.1); Ty, transposon sequence; TEL, sub-telomeric repeat (COS) sequence). Colored arrows bordering each cluster indicate the strain(s) in which this insertion is present. (D) Each of the nine cluster locations and orders can be resolved through the use of a circular intermediate that integrates into the genome via breakage at locations indicated by each colored triangle. (E) Conservation of genomic sequences flanking individual cluster insertion events. Nucleotide alignments are shown for the 50 bp directly adjacent to either side of the five chromosomally-mapped insertion events (shaded yellow when conserved) in addition to the first and last 50 bp of the each cluster (shaded according to partB). Insertions are shaded in red, substitutions in green with both additionally highlighted by asterisks. Sequences used for the alignment are (from top to bottom) S228c, JAY291, RM11-1a, EC1118, AWRI1631, QA23 allele A, QA23 allele B, AWRI796 allele A, AWRI796 allele B, Vin13 allele A, Vin13 allele B, VL3 allele A, VL3 allele B, Fosters B allele A, Fosters B allele B, Fosters O allele A, Fosters O allele B. Nucleotide coordinates for the bases directly flanking the insertion are relative to the S288c genome. In addition to the differences in copy number and location, the exact order of the ORFs within the cluster differed in a location dependent manner (Figure 5B, 5C). However, all of these different ORF arrangements could be resolved into a syntenically-conserved order if the linear genomic copy of each cluster resulted from the differential resolution of a common circular intermediate, with a unique breakpoint in this circular arrangement being observed for each genomic location (Figure 5B–5D). However, despite the differential location of these clusters these integration events appear to select for functional conservation of the genes with the majority of the breakpoints being located within intergenic regions (Figure 5B). Of the two exceptions to this, one of these events occurs at the extreme 3′ end (∼100 bp from the predicted stop codon) of one ORF such that a functional protein is likely to still be produced from this gene. Adding further interest to the mode of transfer of this cluster, its integration into the genome appears to occur without the production of the terminal repeated sequences that would be expected if integration of this element occurred by either homologous recombination or classical mobilization via a transposon-like mechanism. In fact, for at least three of the seven different integration events characterized in this study, integration of the cluster has occurred between two directly adjacent, conserved nucleotides, with a further two events showing only single nucleotide indels at the junction between the cluster and the flanking genomic sequences (Figure 5E). Discussion While S. cerevisiae is one of the most intensively studied biological model organisms and economically-important industrial microorganisms, many characteristics of its genome remain unknown, especially in strains other than the laboratory reference S288c. Through the analysis of six industrial strains, it was possible to show that the industrial members of this species are distinct, with wine and brewing strains being almost as distantly related at the DNA level as they are to either the laboratory or human pathogenic strains. This suggests that despite their roles in performing industrial fermentations, the two groups comprise genetically separate S. cerevisiae lineages. While this is a situation similar to that proposed previously for wine and sake strains of S. cerevisiae [2], the wine and ale strains were much more closely related to each other than to strains with origins outside of Europe [12], and this may reflect a distant common European-type ancestor. The bioethanol strain JAY291 displays an intermediate level of sequence relatedness to the wine strains (compared to ale strains) and also contains the five-gene cluster, suggesting that this strain shares at least some of its genomic origins with the wine isolates. With the relatively recent development of the bioethanol industry, it is not entirely unexpected that yeasts used in this process may well have their origins in commercial strains used in established ethanologenic industries. Wine strains would therefore make a logical choice for this starting point given their highly efficient production of ethanol and relatively high tolerance to a variety inhibitory substances, such as ethanol or polyphenols, that also exist in bioethanol fermentations [29]. In addition to mapping the relationships between these strains, this study uncovered a number of genetic elements not previously identified in the S. cerevisiae genome, as well as expanding the range of several strain specific elements that had been identified previously. This highlights the fact that the genetic variation that underlies the phenotypic diversity of S. cerevisiae goes well beyond that of SNPs or small InDels and is similar to the situation observed with many bacterial species where the pan (species-wide) genome is larger than that observed in any single strain [30]. As for the situation observed with single nucleotide variation, several of these genetic elements link strains to specific industries (e.g. the RTM1 cluster in the ale strains and the five-gene cluster in the wine strains). It would therefore be expected that these ORFs provide selective advantage within specific industries that have favored their retention. For some of these ORFs, such as the RTM1 cluster, the phenotypic benefits that they have historically provided in one industry may be advantageous in modern incarnations of others. For example, modern wine production generally makes use of inoculated commercial strains (rather than the historical use of wild yeast), which are produced on a large scale using molasses as a feedstock. Genes such as the RTM1 cluster may therefore provide advantages in the production of modern commercial wine yeast, but which are lacking from the genomic complement of this group of strains due to the historical practices of winemaking. While other strain-specific ORFs were shown to have much narrower strain ranges (often single strains), it was possible to predict industrially-relevant roles for some of these genes. For example, the novel AAD proteins that were identified in the wine strain AWRI796 may have a direct impact on the range of volatile aromas produced during fermentation, as the aromatic alcohols produced through the action of the AAD enzymes can present very different aromas profiles to their corresponding aldehydes and ketones [31]. The presence of these AADs in specific industrial yeasts may therefore alter the profile of volatile aromas produced during winemaking or brewing, contributing to strain-specific aroma characteristics that are vitally important to many flavor and aroma-based industrial applications. The role of ORFs such as those present in the wine yeast five-gene cluster are less clear but, given the potential regulatory role for at least two of these proteins, they could produce significant phenotypic effects. The generally similar characteristics of high sugar and ethanol tolerance of Zygosacharomyces spp and the wine and bioethanol strains of S. cerevisiae [29], [32], may provide a selective advantage for growth under these conditions. However, understanding the function of individual ORFs is overshadowed by questions regarding the origins of this novel cluster in addition to its effect on genome structure and dynamics. It was recently proposed that this cluster entered the S. cerevisiae genome from Zygosacharomyces spp [13]. Our data suggests that if this is the case, the transfer has either occurred on multiple occasions via a conserved circular intermediate that has integrated randomly into different genomic loci, or the fragment has entered the S. cerevisiae genome on a single occasion but has subsequently mobilized to new genomic locations via a circular intermediate (Figure S3). Alternatively, this cluster is a mobile feature of the S. cerevisiae genome that has been lost from many strains and was transferred to Zygosacharomyces spp. Regardless of the direction or precise mode of transfer it appears that this genetic cluster may mobilize throughout the genome via a method which has yet to be characterized in yeast and therefore provides an entirely new mechanism for the generation of variation in the S. cerevisiae genome. A thorough understanding of the scope of plasticity of the yeast genome is a vital prerequisite for the systematic understanding of yeast biology or for the development of the next generation of yeasts for industrial applications. As more S. cerevisiae strains are sequenced, the suitability of S288c as a “reference” strain for this species is becoming less clear, especially as it appears to lack a large numbers of ORFs found in many other S. cerevisiae strains while containing an abnormally high number of Ty transposable elements [8], [9]. Given the ubiquitous nature of the S288c genome for the design of ‘omics experiments, these novel elements have generally not been considered when studying strains other than S288c. Thus, little data exists regarding the functional contributions of these proteins. As such, they represent a significant knowledge gap with respect to cellular and metabolic modeling strategies. This is especially true for proteins such as the ORF located next to RTM1 which is large (∼800 amino acids) and highly conserved but has no significant homologs outside of a small subset of S. cerevisiae strains on which a function can be based. Fortunately, the continued development of next generation sequencing, such as that applied in this work, have provided the means to now characterize large numbers of yeast strains to provide this information and outline the true scope and variability of this species. Materials and Methods Yeast strains Each commercial strain was obtained from the original mother cultures from the supplier. Genomic DNA was prepared by zymolase digestion and standard phenol-chloroform extraction. Sequencing and assembly Library construction and sequencing was performed at 454 Life Sciences, A Roche Company (Branford, CT) using a pre-release development version of the GS FLX Titanium series shotgun and 3 kb paired-end protocols. Sequences were assembled using MIRA (http://sourceforge.net/apps/mediawiki/mira-assembler/index.php?title=Main_Page) and manually-edited using Seqman Pro (DNAstar). Regions of chromosomal CNV were determined by calculating the per-base sequencing coverage across each sequencing contig with median smoothing (1001 bp window, 100 bp step size). The ratio between the coverage at each genomic location and the overall median genomic coverage was the calculated to determine the level of over-representation for each location. Large-scale chromosomal aneuploidies were detected by screening for regions in which median ratio for a contiguous stretch of at least 101 individual segments differed from the overall genomic median by either 1.25 (5∶4 ratio representing at least 1 extra genomic copy in a tetraploid) or 1.4 fold (3∶2 ratio representing at least 1 extra genomic copy in a diploid). SNP prediction Chromosomal scaffolds from each yeast strain were aligned using FSA [33]. Diploid sequences were assigned into two haploid alleles by converting any degenerate bases into their non-degenerate pairs. Heterozygous regions were divided into both an insertion and deletion allele. A chromosomal consensus was computed for the alignment based upon the most frequent allele at each position in the alignment. Nucleotides that varied from the consensus in each strain were scored as sequence variants and were subsequently divided into SNPs (nucleotide substitution) or InDels (nucleotide insertion or deletion). To enable the comparison to strains with low coverage sequences [12], SNPs that were calculated for each strain relative to S288c (imputed SNPs) were used to create synthetic S288c-based genome sequences that contain the SNPs present in these strains. The genetic relationship between the strains was calculated by editing and concatenating the nucleotide alignments of all sixteen chromosomes using Seaview [34] followed by calculating the distance tree using the NJ algorithm of Clustalw (ignoring gapped regions in the alignment). Tandem repeats were predicted from the chromosomal alignment of all twelve yeast strains using Tandem Repeats Finder [35] using default parameters (match weight, 2; mismatch, 7; indel, 7; pM, 0.80; pI, 0.10; minimum alignment score, 50; maximum period size, 500). Individual repeats were then scored as either being variable if the specific tandem repeat region contained strain- or allele- specific InDels. ORF prediction and comparison ORFs were predicted using Glimmer [20] with the predicted ORFs of S288c being used to build the prediction model (See Datasets S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11 for actual CDS sequences for each strain). Initial ORF designations were made by identifying the best sequence match for each ORF when compared to S288c using BLASTn [21]. Glimmer was also used to predict ORFs from the sequence of S288c (Accession numbers NC001133-NC001148) to correct for false-negatives in the predictions when compared to existing ORF designations in S288c. ORFs with no match to S288c were searched against the full list of non-redundant Genbank proteins to identify a closest existing homology match. ORFs from each strain were then arranged in syntenic order (Table S2 for a full list of ordered ORFs). For protein sequence comparisons, predicted protein sequences were aligned using Clustalw [36] (http://align.genome.jp). Supporting Information Dataset S1 Gllimmer-predicted ORFs from S288c. (8.86 MB TXT) Click here for additional data file. Dataset S2 Gllimmer-predicted ORFs from YJM789. (8.84 MB TXT) Click here for additional data file. Dataset S3 Gllimmer-predicted ORFs from JAY291. (8.49 MB TXT) Click here for additional data file. Dataset S4 Gllimmer-predicted ORFs from RM11-1a. (8.57 MB TXT) Click here for additional data file. Dataset S5 Gllimmer-predicted ORFs from EC1118. (8.54 MB TXT) Click here for additional data file. Dataset S6 Gllimmer-predicted ORFs from QA23. (8.15 MB TXT) Click here for additional data file. Dataset S7 Gllimmer-predicted ORFs from AWRI796. (7.94 MB TXT) Click here for additional data file. Dataset S8 Gllimmer-predicted ORFs from Vin13. (8.02 MB TXT) Click here for additional data file. Dataset S9 Gllimmer-predicted ORFs from VL3. (7.96 MB TXT) Click here for additional data file. Dataset S10 Gllimmer-predicted ORFs from FostersO. (7.71 MB TXT) Click here for additional data file. Dataset S11 Gllimmer-predicted ORFs from FostersB. (7.70 MB TXT) Click here for additional data file. Dataset S12 Novel-predicted ORFs in AWRI796 contig c100. (0.02 MB TXT) Click here for additional data file. Figure S1 Clustal alignment of the hypothetical, conserved gene adjacent to RTM1 in the ale yeasts AWRI1684 and AWRI1685 and the human pathogen YJM789. (0.03 MB DOC) Click here for additional data file. Figure S2 Clustal alignment of the five-gene cluster present in wine yeasts. (3.51 MB PDF) Click here for additional data file. Figure S3 A model for the horizontally-acquired five-gene cluster. (0.04 MB PDF) Click here for additional data file. Table S1 Tandem repeat variability. (0.10 MB XLS) Click here for additional data file. Table S2 Multi-strain ORF comparisons. (5.23 MB XLS) Click here for additional data file. Table S3 Instances of the five-gene cluster. (0.02 MB XLS) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Reconstruction of the genome origins and evolution of the hybrid lager yeast Saccharomyces pastorianus.

              Inter-specific hybridization leading to abrupt speciation is a well-known, common mechanism in angiosperm evolution; only recently, however, have similar hybridization and speciation mechanisms been documented to occur frequently among the closely related group of sensu stricto Saccharomyces yeasts. The economically important lager beer yeast Saccharomyces pastorianus is such a hybrid, formed by the union of Saccharomyces cerevisiae and Saccharomyces bayanus-related yeasts; efforts to understand its complex genome, searching for both biological and brewing-related insights, have been underway since its hybrid nature was first discovered. It had been generally thought that a single hybridization event resulted in a unique S. pastorianus species, but it has been recently postulated that there have been two or more hybridization events. Here, we show that there may have been two independent origins of S. pastorianus strains, and that each independent group--defined by characteristic genome rearrangements, copy number variations, ploidy differences, and DNA sequence polymorphisms--is correlated with specific breweries and/or geographic locations. Finally, by reconstructing common ancestral genomes via array-CGH data analysis and by comparing representative DNA sequences of the S. pastorianus strains with those of many different S. cerevisiae isolates, we have determined that the most likely S. cerevisiae ancestral parent for each of the independent S. pastorianus groups was an ale yeast, with different, but closely related ale strains contributing to each group's parentage.
                Bookmark

                Author and article information

                Journal
                DNA Res
                DNA Res
                dnares
                dnares
                DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes
                Oxford University Press
                1340-2838
                1756-1663
                February 2016
                04 January 2016
                04 January 2016
                : 23
                : 1
                : 67-80
                Affiliations
                [1 ]Department of Biological Information, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology , 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
                [2 ]Suntory Global Innovation Center Limited , 8-1-1 Seikadai, Seika-cho, Soraku-gun, Kyoto 619-0284, Japan
                Author notes
                [* ]To whom correspondence should be addressed. Tel. +81 3-5734-3430. Fax. +81 3-5734-3630. E-mail: takehiko@ 123456bio.titech.ac.jp

                Edited by Dr Katsumi Isono

                Article
                dsv037
                10.1093/dnares/dsv037
                4755528
                26732986
                d09e74f3-db16-4bde-92fc-c9d3edf4c749
                © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 2 August 2015
                : 19 November 2015
                Categories
                Full Papers

                Genetics
                lager beer yeast,interspecies hybrid,saccharomyces pastorianus,loss of heterozygosity,allopolyploid

                Comments

                Comment on this article