13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fungal genome and mating system transitions facilitated by chromosomal translocations involving intercentromeric recombination

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Species within the human pathogenic Cryptococcus species complex are major threats to public health, causing approximately 1 million annual infections globally. Cryptococcus amylolentus is the most closely known related species of the pathogenic Cryptococcus species complex, and it is non-pathogenic. Additionally, while pathogenic Cryptococcus species have bipolar mating systems with a single large mating type ( MAT) locus that represents a derived state in Basidiomycetes, C. amylolentus has a tetrapolar mating system with 2 MAT loci ( P/R and HD) located on different chromosomes. Thus, studying C. amylolentus will shed light on the transition from tetrapolar to bipolar mating systems in the pathogenic Cryptococcus species, as well as its possible link with the origin and evolution of pathogenesis. In this study, we sequenced, assembled, and annotated the genomes of 2 C. amylolentus isolates, CBS6039 and CBS6273, which are sexual and interfertile. Genome comparison between the 2 C. amylolentus isolates identified the boundaries and the complete gene contents of the P/R and HD MAT loci. Bioinformatic and chromatin immunoprecipitation sequencing (ChIP-seq) analyses revealed that, similar to those of the pathogenic Cryptococcus species, C. amylolentus has regional centromeres ( CENs) that are enriched with species-specific transposable and repetitive DNA elements. Additionally, we found that while neither the P/R nor the HD locus is physically closely linked to its centromere in C. amylolentus, and the regions between the MAT loci and their respective centromeres show overall synteny between the 2 genomes, both MAT loci exhibit genetic linkage to their respective centromere during meiosis, suggesting the presence of recombinational suppressors and/or epistatic gene interactions in the MAT- CEN intervening regions. Furthermore, genomic comparisons between C. amylolentus and related pathogenic Cryptococcus species provide evidence that multiple chromosomal rearrangements mediated by intercentromeric recombination have occurred during descent of the 2 lineages from their common ancestor. Taken together, our findings support a model in which the evolution of the bipolar mating system was initiated by an ectopic recombination event mediated by similar repetitive centromeric DNA elements shared between chromosomes. This translocation brought the P/R and HD loci onto the same chromosome, and further chromosomal rearrangements then resulted in the 2 MAT loci becoming physically linked and eventually fusing to form the single contiguous MAT locus that is now extant in the pathogenic Cryptococcus species.

          Author summary

          This manuscript explores the evolution of the genomic regions encoding the mating type loci of basidiomycetous fungi. Typically, the mating system is tetrapolar, meaning that it is composed of 2 unlinked mating type ( MAT) loci ( P/R and HD) that are located on different chromosomes. However, species with bipolar mating systems, in which the P/R and HD loci are located on the same chromosome, have also been identified. Tetrapolar and bipolar species are often closely related, suggesting the transition between these 2 mating systems might occur frequently. For example, the species within the human fungal pathogenic Cryptococcus species complex have bipolar mating systems, with 1 large MAT locus that appears to be a fusion product of the P/R and HD loci. On the other hand, the species that is the closest outgroup to these pathogenic species, Cryptococcus amylolentus, appears to have a classic tetrapolar mating system. Interestingly, the 2 MAT loci of C. amylolentus exhibit centromeric linkage during meiosis, and as a consequence, their resulting meiotic segregation pattern differs from other regions of the genome. Additionally, both pathogenic and non-pathogenic species are found to have large regional centromeres enriched with transposable and repetitive elements. Our genome comparison analyses indicated that these regional centromeres underwent ectopic recombination during the evolution of these 2 lineages. Based on these observations, we propose a model for the transition from the tetrapolar mating system in non-pathogenic C. amylolentus to the bipolar mating system in its related pathogenic species that is initiated by intercentromeric ectopic recombination, followed by chromosomal rearrangements. These events moved the 2 MAT loci closer to each other and eventually fused them to form a single MAT locus. This model is also consistent with recent findings on the organization of MAT loci in other basidiomycetous species.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

          Background The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data. Results Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes. Conclusions The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Analysis of the Genome and Transcriptome of Cryptococcus neoformans var. grubii Reveals Complex RNA Expression and Microevolution Leading to Virulence Attenuation

            Introduction Fungal pathogens pose a major threat to human health because of their proclivity to infect immunocompromised individuals, particularly those afflicted by HIV/AIDS or who have received organ transplants and immunosuppressive therapy [1]. Among these pathogens, the basidiomycete yeast Cryptococcus neoformans is globally distributed and causes pneumonia and meningoencephalitis in an estimated 1 million people annually, leading to ∼620,000 deaths per year [2]. The burden of cryptococcal disease is remarkably high in developing nations (i.e., in India, Africa, and southeast Asia), where it accounts for approximately one-third of all deaths in HIV/AIDS patients, surpassing mortality rates attributable to tuberculosis in some areas [2]. C. neoformans comprises two varieties (var.), grubii (serotype A) and neoformans (serotype D); a former third variety (gattii, serotype B) is now recognized as the separate species Cryptococcus gattii [3]. The Cryptococcus research community initially mapped out genome sequencing projects for the commonly studied strains of C. neoformans representing the different varieties [4]. This strategy yielded a comparative analysis of the genomes of two var. neoformans isolates and employed a large set of expressed sequence tags to establish robust gene annotations [5]. Importantly, this study revealed that C. neoformans genes are intron-rich with the frequent occurrence of alternative splicing and antisense transcription. Subsequently, the community responded to a remarkable outbreak of C. gattii disease among immunocompetent people in western North America by sequencing two genomes, one representing the major outbreak genotype and the other representing the more common global type [6]. This analysis provided evidence for further speciation within the C. gattii complex of genotypes as well as a view of extensive genome variation within the complex and between C. gattii and C. neoformans genomes. Here, we report the latest community effort to enhance genomic resources for C. neoformans by analyzing the genomes and transcriptomes of lineage H99, derived from the primary strain (H99O) of C. neoformans var. grubii (Figure 1). Importantly, strain H99 has been used for virtually all genetic, molecular, and virulence studies conducted with C. neoformans var. grubii and for the majority of virulence studies in recent years with C. neoformans in general. This fact is relevant to human cryptococcosis because var. grubii strains are generally more virulent than var. neoformans strains and, globally, strains of var. grubii cause the vast majority of disease including >99% of infections in AIDS patients and >95% of those overall. A working draft of the H99 genome has been available for several years, and it has been used extensively by the community for the examination of fungal pathogenesis and aspects of unisexual and opposite-sex mating dynamics [7]–[9]. 10.1371/journal.pgen.1004261.g001 Figure 1 Origins of the independent lineages of H99. Since the initial publication, the isolate has lost virulence following laboratory passage (possibly multiple independent times) and was subsequently passaged through the rabbit model of infection to increase virulence and distributed to many labs. All variants were derived from the original sequenced H99 isolate (H99O), and the major strain variants of this study have been termed H99W, H99E, and H99S. The origins of this strain series are as follows. During laboratory passage by repeated growth on YPD rich medium, the H99W/H99ED isolates arose from the H99O original stock (frozen in 1994). H99W and H99ED are distinguished from the parental strain by reduced melanin production, impaired mating, and attenuated virulence. This isolate or a closely related derivate of H99O was sent to the Lodge laboratory (Washington University, St Louis, USA) (H99E), and was subsequently distributed to the Madhani laboratory (University of California, San Francisco, USA) (H99CMO18, hereafter named H99C). Thus, isolates H99W and H99ED (Duke University), H99E (Washington University), and H99C (UCSF) are all closely related to one another. Additionally, John Perfect (Duke University Medical Center, USA) derived the H99S isolate via passage of a mixed H99 frozen stock through the well-validated rabbit model of central nervous system (CNS) infection. The pedigree was constructed based on SNPs and indels identified from sequence analysis. Specific mutations separating independent strains are annotated. The current analysis employed extensive RNA-Seq experiments to significantly improve the annotation and to provide an exceptionally robust analysis of RNA expression in the context of intron splicing, strand-specific transcription, and non-coding RNAs. This analysis revealed a high complexity of the transcriptome structure. In addition, detailed studies were performed to characterize structural features of the genome, including centromeres and origins of replication. Finally, resequencing and genetic analyses were employed to explain a long-standing phenomenon in pathogen biology: the loss of virulence and other attributes such as fecundity upon laboratory passage. Taken together, these studies provide a detailed characterization of the genome of an essential reference strain to support further efforts in understanding cryptococcal pathogenesis. Results/Discussion Genome sequencing and chromosome assembly The genome of the C. neoformans var. grubii H99 strain was sequenced using Sanger technology and assembled into 14 finished scaffolds. Each sequence scaffold corresponds to a single chromosome, with a total length of 18.9 Mb, a size very similar to the ones previously published for C. neoformans var. neoformans and C. gattii [5], [6]. We conducted whole genome comparisons between H99 and three other C. neoformans and C. gattii genomes (JEC21 – serotype D, WM276 – VGI, and R265 – VGII; Figure 2). The comparison between H99 and JEC21 showed that the two genomes are in overall synteny with a few chromosomal rearrangements. Specifically, we identified three translocations that involve H99 chromosomes 3, 4, 5, and 11 (Figure 2A). Additionally, our analysis identified a 400-kb region on H99 chromosome 9 that is inverted between H99 and JEC21, demarcated with star 4. We also identified a second large inversion on H99 chromosome 1 with respect to all three genomes, suggesting via parsimony that there has been a single inversion in H99 relative to the shared common ancestor (star 1). It should be noted that the chromosomal rearrangements identified between H99 and JEC21 genomes herein are consistent with those that have been reported previously [10], [11]. 10.1371/journal.pgen.1004261.g002 Figure 2 Genome comparisons between H99 and other Cryptococcus neoformans (JEC21 - A) and Cryptococcus gattii (WM276 - B and R265- C) strains. Each dot represents the best tBLASTn return in the target genome when a protein sequence of H99 was used as query. The X axis shows the coordinates of the H99 chromosomes anchored on the centromeres at the middle. The Y axis shows the coordinates of the tBLASTn hits on their respective supercontigs/chromosomes in the target genomes. When the two chromosomes under comparison are in synteny, the BLAST hits of that H99 chromosome form a straight line composed by dots of same color (e.g. H99 chromosome 1 in Figure 2A). If there are chromosomal translocations, the BLAST hits of the H99 chromosome are composed of dots with different colors. Additionally, large-scale inversions (>60 kb in size) are highlighted by stars and boxes showing the potential translocations mediated by centromeres (see Results/Discussion). Numbers indicate the chromosomes/supercontigs in the target genome that have undergone translocations relative to the H99 genome. Comparisons between H99 and the two C. gattii genomes revealed more extensive chromosomal rearrangements. Six translocations involving nine H99 chromosomes were apparent when comparing H99 and WM276, while there are at least six translocations involving nine H99 chromosomes when H99 and R265 are compared with each other (Figures 2B and 2C). We identified one large chromosomal inversion on H99 chromosome 1 when it is compared to WM276, which is also apparent by comparison to R265 (star 2). This inversion is shared between H99 and JEC21 and distinguishes the C. neoformans (A and D) and C. gattii lineages. There is an additional inversion when H99 is compared to R265, which is located on H99 chromosomes 9 (star 3) (Figures 2A–2C). These chromosomal rearrangements identified between H99 and the C. gattii genomes are in overall agreement with those reported previously between serotype D C. neoformans and C. gattii, suggesting that these rearrangements may be ancestral to the C. gattii split from C. neoformans [6]. Gene prediction and conservation An initial set of 6,967 protein-coding genes was predicted by combining the results of different gene prediction programs (see Material and Methods). To validate and refine the predicted gene structures, deep-coverage RNA-sequence was generated from different conditions using independent methods. For strand-specific sequencing, poly(A) RNA was purified from cells grown under three different conditions sampled in duplicate: YPD, starvation medium (low glucose and nitrogen medium), and pigeon guano broth (PG) (see Material and Methods). For non-strand-specific sequencing, poly(A) RNA was purified from cells growing under six different conditions in duplicate: YPD exponential phase 30°C; YPD exponential phase 37°C, YPD stationary phase 30°C, YPD exponential phase with 0.01% SDS, YPD exponential phase with fluconazole (10 mg/mL) and YP galactose stationary phase. Trimmed reads were aligned to the H99 genome using Bowtie and TopHat [12], [13]. After elimination of the reads specific to the rRNA loci, a total of 795×106 reads and 244×106 strand-specific reads covered 92% of the genome with at least two reads. Read alignments were compared to the initial gene set of 6,967 predicted genes. Incorporation of the RNA-Seq data improved gene structure accuracy by validating and modifying predicted intron-exon boundaries. We found at least 30 reads spanning predicted exon/intron boundaries for 87% of the introns present in the annotation (n = 32,345), confirming the in silico predicted gene structures. In contrast, 7% of the annotated introns had no spanning reads despite being within an expressed gene, suggesting a potential incorrect annotation. More importantly, we identified 4,724 new introns, resulting in the alteration in the sequence of nearly one-third of the coding sequences (n = 2,705). We identified relatively few new coding genes (n = 53) and deleted about the same number (n = 58), mainly through gene fusion (Table S1). Overall, 6,962 protein-coding genes were predicted, which occupy 85% of the total genome. The remaining 15% are centromeres and intergenic regions. The poly(A) site positions (see below) and strand-specific RNA-Seq data were used to identify precisely the start and stop sites of the transcripts for 92% of these genes. In order to check the validity of these changes, the sequences of the old protein set and the sequences of the protein set based on the updated annotation were compared with the S. cerevisiae set of protein sequences. This comparison was carried out for the 1766 proteins where the sequence was changed, excluding proteins that were added or deleted from the gene set as well as those where the new annotation was for a completely different transcript. These putative proteins were compared to a modified set of the S. cerevisiae proteins, where highly similar duplicate genes were removed. These highly similar S. cerevisiae proteins were removed to reduce, but not completely eliminate, the possibility of aligning the two C. neoformans proteins to different S. cerevisiae orthologs. Proteins were aligned using BLAST. In cases where the new annotation version of the C. neoformans gene was aligned to an S. cerevisiae protein with more than 30% identity, the percent identity between the S. cerevisiae protein and the new and old C. neoformans annotations were compared. This percent identity cutoff was determined empirically to eliminate low similarity spurious alignments. A total of 848 C. neoformans protein pairs met these criteria (Table S2). Of these 848 protein pairs, 575 proteins from the new annotation showed a higher BLAST bit score in comparison with the putative S. cerevisiae homolog, 218 showed no change in BLAST bit score, and 55 showed a lower BLAST bit score. For the 55 cases with a lower BLAST bit score in the new annotation, the change in bit score was very small (less than 2) in 52 cases, the majority of which appeared to be spurious changes in calculations of the bit score resulting from differences in the length of the proteins. For the remaining 3 cases, the new version of the H99 protein set has less similarity to the S. cerevisiae protein set, although the changes in BLAST scores remain minor (Table S2). Comparison of the predicted proteins of H99 to those of the two other Cryptococcus lineages and other basidiomycetes identified unique properties of the Cryptococcus genomes. We compared proteins from H99 to those of the C. neoformans var. neoformans JEC21 genome and the C. gattii WM276 genome (Figure S1A). A core set of 5,569 orthologs is shared among all three species, with the number of paralogs totaling between 5,749 to 5,793 proteins in each genome. Single-copy orthologs share an average of 93% identity between the two C. neoformans genomes and 89% identity between either of these two genomes and that of C. gattii (Figure S1B). The H99 genome contains the largest set of unique proteins (n = 573); however, the differences in annotation methods between these three genomes and in particular the use of RNA-Seq for H99 may account for such differences in gene counts. Comparing the three Cryptococcus genomes to four diverse basidiomycetes identified protein families amplified in the Cryptococcus lineage. The comparison included two other agaricomycetes (Coprinopsis cinereus and Phanerochaete chrysosporium) and two ustilaginomycetes (Ustilago maydis and Malassezia globosa). Of these species, only M. globosa is human-associated; Malassezia species are commonly found on skin where they are the most common cause of dandruff. Compared to these four basidiomycetes, the three Cryptococcus genomes are most highly enriched for transporter families, both Major Facilitator Superfamily (MFS) and sugar transporters (Table S3). In addition, the two C. neoformans species (H99 and JEC21) contain larger numbers of transporters than C. gattii; for example, the most common MFS family is found in 174, 173, and 149 copies in H99, JEC21, and WM276, respectively. MFS transporters are the largest class of transporters found in fungal genomes; MFS subfamilies transport small molecules, including drugs, metabolites, sugars, and other small molecules [14]. Other notable expansions in the Cryptococcus species include fungal-specific transcription factor domains, glucose-fructose oxidoreductases, and phytanoyl-CoA dioxygenases (Table S3). Overall these expansions suggest an increased capacity for transport, a rewiring of transcriptional circuits, and metabolic differences compared to other basidiomycetes. We identified differentially expressed genes using the strand-specific RNA-Seq data to highlight the major expression shifts between these culture conditions. Reads from two biological replicates from each of the three conditions (YPD, starvation medium, and pigeon guano broth) were mapped to transcripts to quantify their abundance (see Material and Methods). Normalized expression levels (FPKM) for the most highly differentially expressed genes (corrected p-value 2) were clustered to identify groups of co-regulated genes. Among these three conditions, rich and limited media produced the most similar expression profiles, while many genes were differentially regulated between both these conditions and pigeon guano (Figure 3). Two clusters of genes (5 and 6) were more highly expressed in pigeon guano as compared to rich and limited media. These clusters of genes were found to be enriched for transporters, transcription factors, and genes involved in lipid metabolism (Table 1). We did not detect significant functional enrichment in the other four clusters. The high expression of transporters and transcription factors under certain growth conditions suggests that these proteins may provide a more diverse repertoire, enabling growth in different ecological niches, including pigeon guano. 10.1371/journal.pgen.1004261.g003 Figure 3 Differentially expressed gene clusters. Genes differentially expressed between the three conditions (PG, pigeon guano; SM, starvation medium; YPD, rich media) were identified from strand-specific RNA-Seq using EdgeR with two biological replicates per condition (rep1, rep2). Expression profiles are ordered based on hierarchical clustering tree; 6 clusters were defined using the kmeans algorithm (Material and Methods). 10.1371/journal.pgen.1004261.t001 Table 1 Functional enrichment of PFAM and TIFRfam domains in differentially expressed gene clusters. Cluster 6; 699 transcripts Pfam or TIGRfam domain Cluster 6 Other genes p-value Corr p-value Relative proportion Role PF07690.11 Major Facilitator Superfamily 69 107 0 0 5,39 Transport TIGR00879 MFS transporter, sugar porter (SP) family 24 24 8,19E-12 5,40E-09 8,37 Transport PF00083.19 Sugar (and other) transporter 33 58 5,89E-11 2,59E-08 4,76 Transport PF04082.13 Fungal specific transcription factor domain 26 47 1,05E-08 3,47E-06 4,63 Transcription PF01408.17 Oxidoreductase family, NAD-binding Rossmann fold 12 6 1,99E-08 5,25E-06 16,73 Redox reactions PF00172.13 Fungal Zn(2)-Cys(6) binuclear cluster domain 28 62 8,37E-08 1,84E-05 3,78 Transcription PF13561.1 Enoyl-(Acyl carrier protein) reductase 14 18 1,62E-06 3,05E-04 6,51 Lipid metabolism PF00106.20 short chain dehydrogenase 16 28 5,51E-06 9,09E-04 4,78 Lipid metabolism PF00441.19 Acyl-CoA dehydrogenase, C-terminal domain 6 1 9,20E-06 1,21E-03 50,2 Lipid metabolism PF02770.14 Acyl-CoA dehydrogenase, middle domain 6 1 9,20E-06 1,21E-03 50,2 Lipid metabolism PF08028.6 Acyl-CoA dehydrogenase, C-terminal domain 5 0 1,36E-05 1,64E-03 4183060,11 Lipid metabolism PF00501.23 AMP-binding enzyme 6 3 9,14E-05 7,54E-03 16,73 PF01266.19 FAD dependent oxidoreductase 10 14 8,67E-05 7,54E-03 5,98 Redox reactions PF02771.11 Acyl-CoA dehydrogenase, N-terminal domain 5 1 7,47E-05 7,54E-03 41,83 Lipid metabolism PF02894.12 Oxidoreductase family, C-terminal alpha/beta domain 6 3 9,14E-05 7,54E-03 16,73 Redox reactions PF08659.5 KR domain 12 21 8,37E-05 7,54E-03 4,78 PF00701.17 Dihydrodipicolinate synthetase family 4 0 1,29E-04 9,44E-03 3346448,09 PF07350.7 Protein of unknown function (DUF1479) 4 0 1,29E-04 9,44E-03 3346448,09 Unknown Cluster 5; 13 transcripts Pfam or TIGRfam domain Cluster 5 Other genes Fisher p Corr p Relative proportion PF00083.19 Sugar (and other) transporter 5 85 1,02E-06 7,07E-04 25,3 Transport PF07690.11 Major Facilitator Superfamily 6 168 1,04E-06 7,07E-04 15,36 Transport PF12006.3 Protein of unknown function (DUF3500) 2 0 4,93E-06 2,24E-03 86033333,33 Unknown RNA-Seq analysis identified a large number of miscRNAs In addition to coding genes, intron identification and manual annotation of strand-specific and non-specific RNA-Seq data allowed the identification of 1,197 transcribed regions that were named miscellaneous RNA (miscRNA) (Figure 4). These miscRNAs can be very short (minimum size = 106 nt) or span several kbs (maximum size = 5,555 nt). Several lines of evidence argue that these are present in the cell and are not artifacts resulting from the sequencing or/and alignment process. First, most of the miscRNAs contain spliced introns (n = 765) or/and a poly(A) site (n = 486), suggesting that they are processed in the same way as coding gene mRNAs. In addition, although their coding capacity is unknown, some of these miscRNAs may in fact code for small proteins, as small hypothetical ORFs can be identified in some. Indeed, virulence-associated small proteins have been previously identified in several different plant-pathogenic fungi [15]. Moreover, ribosome profiling has recently revealed the widespread occurrence of functional peptides encoded by small ORFs (smORF) in metazoans [16], [17]. In C. neoformans var. grubii, the putative proteins encoded by these small ORFs share no sequence homology with any known proteins in other organisms, and the existence of the encoded small proteins in this yeast will require experimental validation. A subset of these miscRNAs could be noncoding RNAs with structural or regulatory roles. The hypothesis of a regulatory role of some miscRNAs is supported by the fact that they are mostly antisense of a coding gene or of another miscRNA (Figure 4), suggesting a potential role in gene expression regulation (see below). One Cryptococcus non-coding RNA has been reported as unpublished data in a recent review as critical for the morphologic switch between the yeast and hyphal form [18]. More experiments are clearly needed to characterize the roles of these miscRNAs in C. neoformans. Finally, we have considered only the polyadenylated RNA, whereas some studies in S. cerevisiae and in mammals suggest the existence of a non-polyadenylated ncRNA population, which would further increase the complexity of the transcriptome structure [19], [20]. 10.1371/journal.pgen.1004261.g004 Figure 4 miscRNAs in C. neoformans var. grubii. A. Two examples of a miscRNA as visualized through Artemis. The coverage of the plus stand is represented by the black curve. The coverage of the minus strand is represented by the blue curve. These results were obtained when cells grown in low glucose and nitrogen medium (starvation medium) underwent strand-specific sequencing. F1, F2, and F3 stand for 5′ to 3′ frames 1, 2, and 3, respectively. F4, F5, and F6 stand for 3′ to 5′ frames 1, 2 and 3, respectively. The small black vertical bars indicate the position of the stop codons for each frame. B. Schematic representation of the positions of the miscRNAs in the C. neoformans var. grubii genome as compared to coding sequences. The numbers of miscRNAs at each position is indicated. The number of miscRNAs in the antisense strand of other miscRNAs is indicated between brackets. Introns in C. neoformans C. neoformans and other basidiomycetes are the most intron-rich fungal species [21], and these introns have been recently shown to be important modulators of gene expression in this yeast [22]. We identified 40,946 introns in the genome, and 99.5% of the expressed genes were found to contain at least one intron. Most of these introns are located within the coding sequences (n = 36,855), but 1,632 and 1,025 introns are located in the 5′-UTRs and 3′-UTRs, respectively. As noted above, the miscRNAs also contain introns, and we found 1,434 introns in miscRNA sequences. The measured intron-density is high (3.35 introns/kb of coding sequence) and similar to what has been reported for some other basidiomycetous fungi based on automatic annotation [23]. Accordingly, exons are small in C. neoformans var. grubii (median size = 194 nt). Remarkably, some exons are as small as 1 bp. making them difficult to identify through an automatic process (see Material and Methods). A typical C. neoformans gene contains 5.7 introns per gene on average, although extreme cases with many more or no introns have been observed. The most intron-rich gene, which encodes Tco4p, one of seven hybrid histidine kinases, contains 42 introns (CNAG_03355) [24]. On the other hand, we identified only 35 genes (Table S4) that are expressed in at least one condition without any intron in their sequences; 10 of these encoded proteins are unique to C. neoformans species. Interestingly, one of the 35 encoded proteins (CNAG_02933) shares homology with bacterial quinone oxidoreductases, suggesting a possible horizontal gene transfer from a bacterium into the ancestor of the C. neoformans/C. gattii species complex. Most of the C. neoformans var. grubii introns are small (median size = 56 nt) whereas some larger ones are present (maximum size = 2,124 nt). Overall, there is very little difference in the characteristics of the introns according to their location within transcripts. Nevertheless, we found that introns within the coding sequences are slightly shorter (median size = 55 nt) than introns within the 5′-UTR (median size = 65 nt) and 3′-UTR (median size = 59 nt). Analysis of the motifs associated with the introns confirmed the splice site consensus sequences previously identified using a smaller set of data [25] (Figure 5), and we found no variation of these motifs based on the intron location. 10.1371/journal.pgen.1004261.g005 Figure 5 Introns in C. neoformans var. grubii. A. Distribution of the introns according to their sizes. B. Distribution of the number of introns per gene. C. Motifs associated with introns in C. neoformans var. grubii. Numbers represent the average distance in bp between the motifs. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of the nucleic acid at that position. We identified alternatively spliced transcripts based on the RNA-Seq data for 741 genes, a level similar to that previously reported for C. neoformans var. neoformans [5]. In the 10.6% of genes with more than one mRNA transcript, these isoforms are the consequence of exon skipping, alternative 3′ or 5′ splice site selection, or intron retention (Figure 6). Analysis of PFAM domains revealed that the 741 genes with alternative transcripts are significantly enriched for transporters (MFS and sugar transporter domains) by 2 to 3 fold (Fisher's exact test, q-value 10−20). The combined gene set, including protein coding genes and miscRNA genes, was submitted to GenBank under accession number CP003820-CP003834. Identification of polyadenylation sites The poly(A) sites were identified as previously described [26]. Briefly, reads containing 5 or more consecutive “A” nucleotides at their end (or “T” at their beginning, which were reverse complemented for subsequent analyses) were selected from each of the libraries, and redundant reads were removed. These non-redundant reads were pooled. The A stretches at the end were trimmed, and reads exceeding 18 nt after trimming were mapped to the reference genome using TopHat2. To distinguish poly(A) tracks of true polyadenylation from poly(A) tracks of internal poly(A) stretches on the mRNAs themselves (i.e. false positives), we analyzed the base composition surrounding the end of the mapped reads and discarded those that might not represent true polyadenylation. Reads with the following properties were regarded as false positives and removed: 1) reads with ≥5 A nt immediately downstream of the terminus; 2) depending on the actual length of the poly(A) stretch of the read (e.g. N nt), reads for which 70% of N nt downstream of the end site are As; and 3) reads with ≥8 A nt within 10 nt immediately upstream of the end site. The polyadenylation site was then defined as the base immediately downstream of the read. To ensure that the identified polyadenylation sites were not false positives derived from low quality base calls, reads with quality scores 2) were hierarchically clustered using Euclidian distance and complete clustering methods; six clusters of genes with similar expression conditions across these conditions were identified using kmeans clustering. Comparative genomics Protein conservation was examined using ORTHOMCL (version 1.4 with a Markov inflation index of 1.5 and a maximum e-value of 1×10−5). PFAM and TIGRFAM domains within each gene were identified with Hmmer3 [110] using the PFAM27 and TIGRFAM13 release versions. Domain counts between genomes were compared using Fisher's Exact test, with q-value correction for multiple testing [111]. Pulsed-field gel electrophoresis Preparation of agarose-embedded intact Cryptococcus chromosomal DNA was performed as previously described [112]. Chromosomes were separated in 1% pulsed-field certified agarose gels using a CHEF-DRIII pulsed-field gel electrophoresis system (Bio-Rad, Richmond CA) in 0.5× TBE running buffer. Running conditions were as follows: ramped switch time from 1.5 sec to 10 sec, 120°, 6 V/cm, 24 h, performed at 14°C using a Bio-Rad cooling module. Chromosomes were stained and visualized with ethidium bromide. Southern blotting of pulsed-field gels was performed as previously described [113] onto Hybond-XL nylon membranes (GE Healthcare, Chalfont St Giles, UK). Blots were UV crosslinked with 100 mJ UV using a Stratagene UV Stratalinker 2400. Radiolabelled probes were prepared using the GE Healthcare Rediprime II Random Prime Labeling System (GE Healthcare) with 20 µCi α-32P dCTP (Perkin Elmer, Waltham MA). Hybridizations were performed overnight at 65°C. Probes were detected by exposing the blots to Fujifilm Super RX medical X-ray film (Fujifilm, Tokyo JA). Identification of largest ORF-free regions and mapping of transposons in C. neoformans We scanned the genome of C. neoformans by using the genome map feature already available in the C. neoformans genome database (http://www.broadinstitute.org/annotation/genome/cryptococcus_neoformans/GenomeMap.html) searching for ORF-free regions on each chromosome. This was followed by the determination of the largest ORF-free regions on each chromosome. The DNA sequences of each of the transposons (e.g. Tcn1–Tcn6) have been previously reported [114]. The nucleotide sequences of these Tcn elements were used as query sequences in a BLASTn analysis to identify the transposable elements present in the genome. The BLAST hits against each of the transposons in all chromosomes were obtained and mapped on each of these ORF-free regions. Molecular techniques C. neoformans genomic DNA was prepared using the CTAB method [115]. Constructs for targeted replacement of DNA regions in C. neoformans were made using overlap PCR with primers listed in Table S12. The lmp1Δ mutant strain was isolated in the H99S background by replacing the LMP1 coding sequence with the neomycin resistance marker from plasmid pJAF1 [116]. For complementation, LMP1 plus 1 kb flanking region was amplified from the H99S strain, cloned into pCR2.1-TOPO (Life Technologies), and subsequently subcloned into the plasmid pCH233, which contains the nourseothricin resistance marker. Biolistic transformation was performed as previously described [117]. The constructs to truncate the left and right ends of chromosome 14 comprised 4- to 5-kb fragments fused to the nourseothricin (NAT) resistance marker and a seed sequence for the telomere. For the truncation of the left end of chromosome 14, a 5-kb region for homologous recombination (HR) was amplified with primers GI008–GI009. A construct of the correct orientation was generated by fusing this fragment with the NAT marker amplified with primers GI003–GI013, while for the construct of the opposite orientation NAT was amplified with primers GI003 through GI0014. For the truncation of the right end of chromosome 14, a 4-kb region for HR was amplified with primers GI010–GI005. A construct of the correct orientation was generated by fusing this fragment with the NAT marker amplified with primers GI003–GI015, while the construct of the opposite orientation and NAT was amplified with primers GI003–GI0016. The constructs were used for biolistic transformation of the diploid strain AI187 of C. neoformans var. grubii. Transformants were selected on YPD+100 µg/mL of nourseothricin, and homologous integration strains were identified by PCR. Chromatin immunoprecipitation ChIP assays were conducted as previously described with some modifications [118], [119]. Briefly, C. neoformans was grown in 100 mL YPD until the exponential phase and was crosslinked with 1% formaldehyde at room temperature for 35 min and quenched by adding glycine to a final concentration of 125 mM. The cells were harvested and resuspended in 10 mL of distilled water containing 0.5 mL β-mercaptoethanol and incubated for 1 hour in a shaker incubator at 150 rpm at 30°C. Cells were washed and resuspended in spheroplasting buffer (1 M sorbitol/0.1 M sodium citrate, pH 5.8, and 0.01 M EDTA, pH 8.0) with 40 mg of lysing enzyme from Trichoderma harzianum (Sigma) and incubated for 4–5 hours at 37°C. After achieving 90% spheroplasts, the cells were washed as previously described [118], and chromatin was finally resuspended in 1 mL extraction buffer (50 mM HEPES, pH 7.5/140 mM NaCl/1 mM EDTA/0.1% Na-deoxycholate/1% Triton-X) containing protease inhibitor cocktail (Sigma). The lysates were sonicated to obtain chromatin fragments of an average size of 300–500 bp (14× bursts at 30% amplitude with 10 sec pulse using a SONICS Vibra cell). After centrifuging (13,000 rpm, 10 min, 4°C), chromatin was divided to obtain total and IP DNA (with or without antibodies) preparations. Total DNA (T): Approximately 100 µL of lysate were added to 0.4 mL of elution buffer (1% SDS/0.1M NaHCO3) with 20 µl of 5M NaCl. The reaction was incubated at 65°C overnight to reverse the crosslinking. DNA was extracted as described previously [119] and resuspended in 25 µL of MilliQ water containing RNase (10 µg/mL). Immunoprecipitated material (IP): The remaining lysate (900 µL) was distributed into two 1.5-mL Eppendorf tubes (0.45 mL in each). In one of the tubes, 20 µL of RFP-TRAP beads (ChromoTek) were added and used as IP DNA with antibodies. In another tube, 20 µL of control beads were added to serve as a negative control. Both tubes were incubated overnight at 4°C on a roller. The IP materials were processed as described previously with some modifications [119]. The washing step with high salt buffer was done twice, while the LiCl buffer washing was done only once, and beads were pelleted at 5,400 rpm for two minutes. The isolated DNA was then dried and the pellet was resuspended in 20 µL MilliQ water containing RNase (10 µg/ml). The ChIP sequencing analysis was done as previously described [55]. Briefly, ChIP-Seq analysis was performed at Genotypic Technology. In total, 6 million single-end 36-nt reads for IP and 24 million reads for input DNA were generated on the Illumina GAIIx platform. Raw reads were processed using SeqQC (version 2.2). Reads were aligned to the target C. neoformans (C. neoformans GCA_000149245.2 with new chromosome 14 assembly) using Bowtie version 0.12.8 and the parameters “-v 3 –best -m 1”. About 90% of the aligned reads were obtained per sample. Peak calling was performed using Homer v3.13 in “histone” mode using default parameters and fold changes of 1.5 and 3. Chromosome-wise read distribution and read depth graphs were generated using R scripts (proprietary to Genotypic Technology, Bangalore, India). Analysis of replication intermediates Cells were grown to mid-log phase in YEPD (2–3×107 cells/mL), mixed with 0.5 volumes of ice-cold Azide stop buffer (0.5 M NaOH, 0.4 M Na2EDTA, 2% w/v NaN3), collected by filtration through a Nylon filter, and resuspended in cold sterile distilled H2O. DNA was prepared from nuclei as described [120]. After digestion with restriction enzymes as indicated, DNA was electrophoresed on neutral-neutral 2D gels, blotted, and hybridized as described [121]. Statistical analysis for progeny of crosses Multiple linear regression was used to fit each of the continuous response variables (level of melanization on niger seed and l-DOPA agar) on the basis of all the binary SNP and indel marker values and chromosome 9 genotype. The isolates were treated as a random sample from the Cryptococcus population. Multinomial logistic regression was used to predict mating phenotype, categorized as either no mating, resembling H99C or resembling KN99a. Further analysis was conducted by collapsing the mating phenotype categories into the following: no mating or like H99C (category 0) or like KN99a (category 1). This was considered reasonable because the H99C strain mates much less frequently than the KN99a strain. A Bonferroni correction was applied to keep the family-wise error rate at 0.05. Stata (StataCorp LP, College Station, TX) was used for the statistical analysis. Stress sensitivity tests Each H99 passaged strain was incubated overnight (about 16 h) at 30°C in liquid YPD, washed, serially diluted (1 to 104 dilutions) with dH2O, and spotted (3 µL) onto solid YPD containing the indicated concentration of stress inducers, such as SDS, CdSO4, or fludioxonil. To test oxidative stress, cells were spotted onto solid YPD containing the indicated concentration of tert-butyl hydroperoxide (tBOOH), menadione, and diamide. To examine antifungal drug resistance, amphotericin B (AMB), flucytosine (5-FC), and azole drugs, including itraconazole (ICZ), ketoconazole (KCZ), and fluconazole (FCZ), were used. To evaluate ER stress, cells were spotted onto solid YPD containing the indicated concentration of ER stress inducers, such as tunicamycin (TM) or dithiothreitol (DTT). Cells were incubated at 30°C and photographed during the incubation period. Urease test Each strain was cultured overnight (about 16 h) at 30°C in liquid YPD and resuspended in dH2O. Equal numbers of Cryptococcus cells (108 cells/mL) were spotted (5 µL) onto Christensen's urea agar [122] and incubated at 30°C for two to five days. Each plate was photographed during the incubation period. Western blot analysis of Hog1 phosphorylation Each H99 strain was grown to mid-logarithmic phase in YPD at 30°C. Cultures were resuspended in lysis buffer (50 mM Tris-HCl pH 7.5, 1% sodium deoxycholate, 5 mM sodium pyrophosphate, 10 nM sodium orthovanadate, 50 mM NaF, 0.1% [w/v] SDS, and 1% [v/v] Triton X-100) containing 1× protease inhibitor cocktail (Calbiochem) with 0.5 mm zirconia/silica beads (BioSpec Products, Inc.) and disrupted. Protein concentrations were determined using Pierce BCA Protein Assay Kit (Thermo Scientific), and equal amounts of protein were loaded into a 10% Tris-glycine gel (Novex) and transferred to Immuno-blot PVDF membrane (Bio-Rad). A rabbit p38-MAPK-specific antibody (Cell Signaling Technology) was used to detect of phosphorylated Hog1. A rabbit polyclonal anti-Hog1 antibody (Santa Cruz Biotechnology) was used as a loading control. Virulence assays Rabbit virulence assays Briefly, cryptococcal strains were prepared by growth at 30°C for 2 days in YPD broth. The cells were centrifuged and washed with endotoxin-free phosphate buffered saline (PBS). 108 yeast cells in a volume of 0.3 mL were inoculated intracisternally into 2–3 kg immunosuppressed New Zealand White rabbits (3 rabbits per strain) that had been first sedated with ketamine/xylazine [123]. Rabbits were sedated on days 2, 4, 7 and 10 after inoculation and cerebrospinal fluid was withdrawn, diluted in PBS and plated on YPD agar to assess for quantitative yeast counts. To induce and maintain immunosuppression, rabbits were given an intramuscular injection of a hydrocortisone acetate suspension (5 mg/kg/d) one day prior to inoculation of the yeast cells and daily during infection. Murine virulence assays Strains of C. neoformans were grown overnight in YPD broth. The cells were centrifuged and washed with PBS. Virulence studies were performed using a murine nasal inhalation model of infection. Eight week old CBA/J female mice were inoculated by dripping 0.05 mL of PBS containing the C. neoformans cells into the nares of anesthetized mice suspended by their incisors [124]. Mice were monitored daily and those showing the signs of being morbidity (weight loss of greater than 25% or extension of the cerebral portion of the cranium) were sacrificed by CO2 asphyxiation. G. mellonella virulence assays For virulence in the wax moth assay, each G. mellonella larva was injected in the terminal pseudopod with C. neoformans cells (1×105 in 5 µL PBS). Larvae were incubated at 30°C, and virulence was measured by scoring the survival of the larvae every 24 h as previously described [125]. Supporting Information Figure S1 Cryptococcus protein conservation. A. Conserved protein counts for C. neoformans var. grubii (H99), C. neoformans var. neoformans (JEC21), and C. gattii (WM276). Counts of proteins in conserved gene clusters, as defined by OrthoMCL [127], are listed in overlapping regions of the Venn diagram. Counts for proteins (including orthologs and paralogs) in individual species (H99, JEC21, and WM276 are shown in red, blue, and green respectively) and the total number of conserved clusters (bold black type) are shown. B. Protein identity of single copy orthologs. OrthoMCL protein clusters with one ortholog per species were aligned with MUSCLE [128] and pairwise identity was computed for each species pair. (PDF) Click here for additional data file. Figure S2 A. Relationship between the distance between sites within a cluster and the number of poly(A) clusters. B. Distance between the poly(A) clusters within a single mRNA. (PPT) Click here for additional data file. Figure S3 Additional examples of differential expression of miscRNAs antisense of a coding gene as observed by Northern blot. RNA was extracted from cells growing in YPD (2×108 cells/mL) at 30°C (condition 1), YPD (5×107 cells/mL) at 30°C (condition 2), YPD with 0.01% SDS (5×107 cells/mL) at 30°C (condition 3), YPD with 10 mg/mL fluconazole (5×107 cells/mL) at 30°C (condition 4), YPD (5×107 cells/mL) at 37°C (condition 5), and YP galactose (2×108 cells/mL) at 30°C (condition 6) in duplicate. Then, 5 µg were separated on a denaturing electrophoresis agarose gel, electrophoresed, and transferred to a nylon membrane. RNAs were then hybridized with strand-specific probes. Black lanes represent the positions of probes. Schematics of the genome loci organizations are given. (PPT) Click here for additional data file. Figure S4 RNA-Seq analysis of centromeric regions. Low transcript levels are observed between the last genes bordering the centromeric regions in each chromosome. The coordinates indicate the position of the part of the chromosome visualized through Artemis. (PPT) Click here for additional data file. Figure S5 Plasmid replication intermediates analysis of two C. neoformans plasmids (pPM8 and pCSN5) shows that linear plasmids cannot be used to identify bona fide replication origins in Cryptococcus. (Left upper panel) The 2D gel patterns of overlapping fragments of pPM8, which show strong arcs of Y-shaped intermediates and weaker complete replication bubble arcs, indicate that replication initiates throughout the linear plasmid, although the bubble signal is more intense in the right part of the molecule containing URA5. (Right upper panel) The 2D gel patterns of pCSN5 replication intermediates show a strong arc of Y-shaped molecules and a weaker pattern of replication termination intermediates, which are replicated by converging forks, indicating that replication initiates at or near the telomeres of the plasmid. (Lower panels) 2D gel patterns of the 3,858-bp StuI and the 3,127-bp MscI fragments from the chromosomal region containing URA5, diagrammed below. Restriction fragments of this region contain only Y-shaped replication intermediates, indicating that replication does not initiate at detectable levels within the URA5 locus on the chromosome. The arcs containing bubble-shaped (B), Y-shaped (Y), and termination (T) replication intermediates are labeled on the 2D gel pattern. The red arrows at the ends of the plasmid molecules represent telomeres. (PPT) Click here for additional data file. Figure S6 Phenotypic variations in response to environmental cues and antifungal drug resistance among different H99 passage strains. (A–F) Each C. neoformans strain (H99O, H99F, H99S, H99W, H99E, KN99α, KN99a, and H99C) was incubated overnight (about 16 h) at 30°C in liquid YPD medium, washed, serially diluted (1 to 104 dilutions) with dH2O, and spotted (3 µL) onto solid YPD containing the indicated concentration of stress inducers or antifungal drugs (0.5 mM tBOOH; 0.02 mM menadione; 2.5 mM diamide; 0.2 µM CdSO4; 0.03% SDS; 0.3 µg/mL TM; 20 mM DTT; 0.04 µg/mL ICZ; 0.2 µg/mL KCZ; 13 µg/mL FCZ; 1.1 µg/mL AMB; 800 µg/mL 5-FC; and 1.5 µg/mL fludioxonil). (G) Different H99 passaged strains were cultured to mid-logarithmic phase in YPD at 30°C, and total protein extracts were prepared for western blot analysis as described in the Materials and Methods. To examine Hog1 phosphorylation levels, a rabbit antibody specific to dually phosphorylated p38-MAPK was used. The same blot was stripped and then probed with polyclonal anti-Hog1 antibody as a loading control. (PPT) Click here for additional data file. Figure S7 Urease production in different H99 passaged strains. Each C. neoformans strain (H99O, H99F, H99S, H99W, H99E, KN99α, KN99a, and H99C) was cultured overnight (about 16 h) at 30°C in liquid YPD and resuspended with dH2O. Then, 5 µL of a suspension containing 108 cells/mL were spotted onto solid urea-containing agar (Christensen's medium) and incubated at 30°C for two to five days. Urea is a nitrogen source and is converted to ammonia by urease secreted in C. neoformans, which increases the pH of the medium. An increased pH is indicated by a change in color from yellow to red-violet color due to the inclusion of phenol red, a pH indicator. Each plate was photographed during the incubation period. (PPT) Click here for additional data file. Figure S8 Electrophoretic karyotypic analysis via PFGE of the H99 strains revealed a size reduction of chromosome 9 in H99ED and H99C. Probing of the left and right telomeres following in-gel digestion with SwaI and SfiI of chromosomal plugs revealed that while the left subtelomere fragments of chromosome 9 were identical in length for all eight strains tested, the right subtelomere of H99ED and H99C was ∼25 kb smaller (approximate position marked with “?”). The SwaI-digested blot was hybridized to the chromosome 9L probe (yellow arrow) while the SfiI-digested blot was hybridized to the chromosome 9R probe (green arrow). The size of the band in reference to the band size of the laboratory reference strain H99O indicates whether any telomeric length changes have taken place. (PPT) Click here for additional data file. Figure S9 Sequencing the end of the subtelomere of chromosome 9R in H99ED. The new chromosome endpoint in these strains was characterized via PCR to determine the precise nucleotide at which they were truncated, confirming the loss of a region containing nine genes, all hypothetical proteins (CNAG_07002, CNAG_07786, CNAG_07787, CNAG_07788, CNAG_06953, CNAG_06954, CNAG_07789, CNAG_07790, CNAG_07791). Importantly, while it was confirmed that the segment was deleted, all of these genes have duplicates elsewhere in the genome, as is the case with most C. neoformans subtelomeric genes. Strain H99O was used as a negative control. The PCR product obtained in the UQ1261/UQ618 reaction was sequenced and aligned against the H99O sequence. (PPT) Click here for additional data file. Figure S10 Phenotypic analysis of F1 progenies. A. Mating phenotype segregates in progeny set. Mating assays with KN99a (H99C and progeny 1, 3, 7, 9, 10, 13, 14, 18, 20, 23, and 27) and KN99α (KN99a and progeny 2, 4, 5, 6, 8, 11, 12, 15, 16, 17, 19, 21, 22, 24, 25, and 26) on V8 agar incubated at room temperature for seven days in the dark. B. Melanin phenotype segregates in progeny set. Melanization assays on (left) l–DOPA agar or, (right) niger seed agar incubated at 37°C for two to three days. (PPT) Click here for additional data file. Table S1 List of the modifications of the C. neoformans genome annotation. (DOC) Click here for additional data file. Table S2 Compared sequence similarities between the new and the former protein set and the protein set of S. cerevisiae. (XLS) Click here for additional data file. Table S3 List of protein families amplified in the Cryptococcus lineage. (XLS) Click here for additional data file. Table S4 Genes expressed without an intron in C. neoformans var. grubii. (DOC) Click here for additional data file. Table S5 List of the genes with overlapping CDS. (XLS) Click here for additional data file. Table S6 Coordinates of the centromeric regions in C. neoformans H99. (DOC) Click here for additional data file. Table S7 Positions of the replication origin in C. neoformans. (DOC) Click here for additional data file. Table S8 Cryptococcus orthologs of DNA replication initiation proteins. (DOC) Click here for additional data file. Table S9 SNPs and indels identified in H99 series. (DOC) Click here for additional data file. Table S10 Sequencing read statistics. (DOC) Click here for additional data file. Table S11 List of the Bioprojects associated with the present study. (XLS) Click here for additional data file. Table S12 Primers used in this study. (DOC) Click here for additional data file. Text S1 History of the H99 strain and consult note of February 14, 1978. (DOC) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Double-strand breaks associated with repetitive DNA can reshape the genome.

              Ionizing radiation is an established source of chromosome aberrations (CAs). Although double-strand breaks (DSBs) are implicated in radiation-induced and other CAs, the underlying mechanisms are poorly understood. Here, we show that, although the vast majority of randomly induced DSBs in G(2) diploid yeast cells are repaired efficiently through homologous recombination (HR) between sister chromatids or homologous chromosomes, approximately 2% of all DSBs give rise to CAs. Complete molecular analysis of the genome revealed that nearly all of the CAs resulted from HR between nonallelic repetitive elements, primarily Ty retrotransposons. Nonhomologous end-joining (NHEJ) accounted for few, if any, of the CAs. We conclude that only those DSBs that fall at the 3-5% of the genome composed of repetitive DNA elements are efficient at generating rearrangements with dispersed small repeats across the genome, whereas DSBs in unique sequences are confined to recombinational repair between the large regions of homology contained in sister chromatids or homologous chromosomes. Because repeat-associated DSBs can efficiently lead to CAs and reshape the genome, they could be a rich source of evolutionary change.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: ResourcesRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: SoftwareRole: Writing – original draft
                Role: Data curationRole: Formal analysisRole: MethodologyRole: ResourcesRole: SoftwareRole: Writing – original draftRole: Writing – review & editing
                Role: Formal analysisRole: InvestigationRole: MethodologyRole: ValidationRole: VisualizationRole: Writing – original draft
                Role: Formal analysisRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draft
                Role: Data curationRole: InvestigationRole: ResourcesRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: InvestigationRole: ResourcesRole: Writing – review & editing
                Role: Data curationRole: MethodologyRole: ResourcesRole: SoftwareRole: Writing – original draft
                Role: Data curationRole: MethodologyRole: ResourcesRole: SoftwareRole: Writing – original draft
                Role: Data curationRole: MethodologyRole: ResourcesRole: SoftwareRole: ValidationRole: Writing – original draft
                Role: Formal analysisRole: InvestigationRole: ResourcesRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: ResourcesRole: SupervisionRole: Writing – review & editing
                Role: Academic Editor
                Journal
                PLoS Biol
                PLoS Biol
                plos
                plosbiol
                PLoS Biology
                Public Library of Science (San Francisco, CA USA )
                1544-9173
                1545-7885
                11 August 2017
                August 2017
                11 August 2017
                : 15
                : 8
                : e2002527
                Affiliations
                [1 ] Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, United States of America
                [2 ] Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India
                [3 ] Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
                [4 ] Lehrstuhl für Allgemeine und Molekulare Botanik, Ruhr-Universität Bochum, Bochum, Germany
                [5 ] Université de Strasbourg, CNRS UMR7156, Strasbourg, France
                [6 ] Westerdijk Fungal Biodiversity Institute, Utrecht, The Netherlands
                [7 ] Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam, The Netherlands
                [8 ] Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, Evry, France
                [9 ] Université d'Evry, UMR 8030, Evry, France
                [10 ] Centre National de Recherche Scientifique (CNRS), UMR 8030, Evry, France
                University College Dublin, Ireland
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0002-2895-1153
                Article
                pbio.2002527
                10.1371/journal.pbio.2002527
                5568439
                28800596
                a28a7aa8-b09b-4e1f-ae6c-c09f5987a9ea
                © 2017 Sun et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 23 March 2017
                : 25 July 2017
                Page count
                Figures: 7, Tables: 1, Pages: 31
                Funding
                Senior Research Fellow of Council of Scientific and Industrial Research (CSIR), Govt. of India (grant number 09/733(0179)/2012/EMR-I) received by VY. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India (intramural funding) received by KS. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. NIH/NIAID (grant number R01 grant AI50113-13) received by JH. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. National Human Genome Research Institute (grant number U54HG003067) received by CAC. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. NIH/NIAID (grant number R37 MERIT award AI39115-20) received by JH. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. German Research Foundation (grant number DFG NO407/7-1) received by MN. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Tata Innovation Fellowship (grant number BT/HRT/35/01/03/2017) received by KS. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Genetics
                Genetic Loci
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromosomes
                Chromosome Structure and Function
                Centromeres
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromosomes
                Biology and Life Sciences
                Organisms
                Fungi
                Cryptococcus
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromosomal Aberrations
                Translocations
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromosomes
                Chromosome Pairs
                Chromosome 10
                Biology and Life Sciences
                Genetics
                Genetic Elements
                Mobile Genetic Elements
                Transposable Elements
                Biology and Life Sciences
                Genetics
                Genomics
                Mobile Genetic Elements
                Transposable Elements
                Biology and Life Sciences
                Organisms
                Fungi
                Cryptococcus
                Cryptococcus Neoformans
                Biology and Life Sciences
                Microbiology
                Medical Microbiology
                Microbial Pathogens
                Fungal Pathogens
                Cryptococcus Neoformans
                Medicine and Health Sciences
                Pathology and Laboratory Medicine
                Pathogens
                Microbial Pathogens
                Fungal Pathogens
                Cryptococcus Neoformans
                Biology and Life Sciences
                Mycology
                Fungal Pathogens
                Cryptococcus Neoformans
                Custom metadata
                vor-update-to-uncorrected-proof
                2017-08-23
                All relevant data can be accessed through NCBI BioProject accession no. PRJNA200571 and EBI accession no. PRJEB1993.

                Life sciences
                Life sciences

                Comments

                Comment on this article