8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      In eubacteria, unlike eukaryotes, there is no evidence for selection favouring fail-safe 3’ additional stop codons

      research-article
      * ,
      PLoS Genetics
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Errors throughout gene expression are likely deleterious, hence genomes are under selection to ameliorate their consequences. Additional stop codons (ASCs) are in-frame nonsense ‘codons’ downstream of the primary stop which may be read by translational machinery should the primary stop have been accidentally read through. Prior evidence in several eukaryotes suggests that ASCs are selected to prevent potentially-deleterious consequences of read-through. We extend this evidence showing that enrichment of ASCs is common but not universal for single cell eukaryotes. By contrast, there is limited evidence as to whether the same is true in other taxa. Here, we provide the first systematic test of the hypothesis that ASCs act as a fail-safe mechanism in eubacteria, a group with high read-through rates. Contra to the predictions of the hypothesis we find: there is paucity, not enrichment, of ASCs downstream; substitutions that degrade stops are more frequent in-frame than out-of-frame in 3’ sequence; highly expressed genes are no more likely to have ASCs than lowly expressed genes; usage of the leakiest primary stop (TGA) in highly expressed genes does not predict ASC enrichment even compared to usage of non-leaky stops (TAA) in lowly expressed genes, beyond downstream codon +1. Any effect at the codon immediately proximal to the primary stop can be accounted for by a preference for a T/U residue immediately following the stop, although if anything, TT- and TC- starting codons are preferred. We conclude that there is no compelling evidence for ASC selection in eubacteria. This presents an unusual case in which the same error could be solved by the same mechanism in eukaryotes and prokaryotes but is not. We discuss two possible explanations: that, owing to the absence of nonsense mediated decay, bacteria may solve read-through via gene truncation and in eukaryotes certain prion states cause raised read-through rates.

          Author summary

          In all organisms, gene expression is error-prone. One such error, translational read-through, occurs where the primary stop codon of an expressed gene is missed by the translational machinery. Failure to terminate is likely to be costly, hence genomes are under selection to prevent this from happening. One proposed error-proofing strategy involves in-frame proximal additional stop codons (ASCs) which may act as a ‘fail-safe’ mechanism by providing another opportunity for translation to terminate. There is evidence for ASC enrichment in several eukaryotes. We extend this evidence showing it to be common but not universal in single celled eukaryotes. However, the situation in bacteria is poorly understood, despite bacteria having high read-through rates. Here, we test the fail-safe hypothesis within a broad range of bacteria. To our surprise, we find that not only are ASCs not enriched, but they may even be selected against. This provides evidence for an unusual circumstance where eukaryotes and prokaryotes could solve the same problem the same way but don’t. What are we to make of this? We suggest that if read-through is the problem, ASCs are not necessarily the expected solution. Owing to the absence of nonsense-mediated decay, a process that makes gene truncation in eukaryotes less viable, we propose bacteria may rescue a leaky stop by mutation that creates a new stop upstream. Alternatively, raised read-through rates in some particular conditions in eukaryotes might explain the difference.

          Related collections

          Most cited references68

          • Record: found
          • Abstract: found
          • Article: not found

          The evolutionary consequences of erroneous protein synthesis.

          Errors in protein synthesis disrupt cellular fitness, cause disease phenotypes and shape gene and genome evolution. Experimental and theoretical results on this topic have accumulated rapidly in disparate fields, such as neurobiology, protein biosynthesis and degradation and molecular evolution, but with limited communication among disciplines. Here, we review studies of error frequencies, the cellular and organismal consequences of errors and the attendant long-range evolutionary responses to errors. We emphasize major areas in which little is known, such as the failure rates of protein folding, in addition to areas in which technological innovations may enable imminent gains, such as the elucidation of translational missense error frequencies. Evolutionary responses to errors fall into two broad categories: adaptations that minimize errors and their attendant costs and adaptations that exploit errors for the organism's benefit.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Evidence That Mutation Is Universally Biased towards AT in Bacteria

            Introduction Mutation generates the variability on which natural selection acts. Mutation is not an entirely stochastic process, as it acts according to certain deterministic biases. Because of this, biases in the outcome of the evolutionary process result not only from selection but also from the biases of mutation. In order to understand evolution it is therefore necessary to elucidate mutational biases and the ways in which these biases themselves change in evolution. Nucleotide content variation is much more pronounced in bacteria compared to multi-cellular eukaryotes [1]. GC contents in bacteria vary from less than 25% to over 75% [1]–[3]. Related bacteria even from relatively broad phylogenetic groupings tend to show similar genomic nucleotide content [3]. For example bacterial genomes from the order Bacillales tend to be GC-poor, from the order Enterobacteriales to have intermediate GC contents, and from the phylum Actinobacteria to be GC-rich (Figure 1A). In addition, GC content values measured at different functional site categories (intergenic, synonymous, and non-synonymous) show highly correlated patterns of variation across bacteria [4] (Figure 1B). These observations suggest that forces determining GC content in bacteria operate both genome-wide and consistently over long periods of time. One possibility is that the main force driving nucleotide content variation in bacteria is mutation. This possibility has often been assumed true (for example see [2], [4]–[7]). Under this assumption, clades that are GC rich are clades in which mutation has been consistently biased towards GC while clades that are AT rich are clades in which mutation has been consistently biased towards AT. If true, bacterial mutational biases must to be extremely variable to be able to generate the extreme variation observed in bacterial nucleotide content. 10.1371/journal.pgen.1001115.g001 Figure 1 Phylogenetic and genomic variation in GC content. (A) The phylogeny of the four broad clades examined in this study. It was built using the iTOL webpage (http://itol.embl.de/). Average GC content of the different broad clades are indicated on the margins. Small blue triangles represent the five lineages of clonal pathogens used in the analysis. (B) Genomewide observed GC content of synonymous and non-synonymous sites correlate with the intergenic GC content across bacterial genomes. A second possibility is that it is not variation in mutational biases that leads to variation in nucleotide content, but rather variation in the relative probabilities of fixation of A/T to G/C and G/C to A/T mutations [1], [3], [8]. When considering changes to nucleotide content, differences in fixation probabilities can stem both from differences in the strength and direction of natural selection and differences in the rates of biased gene conversion (BGC) [1], [9], [10]. Natural selection affects the probability of fixation of an allele based on the alleles fitness advantage or disadvantage and the effective population size (Ne) of the organism in question. Similarly BGC is also dependent on Ne and on the advantage or disadvantage the allele has. Here however, this advantage or disadvantage is not in fitness but rather in the increased or decreased probability of the allele to be passed on to the next generation through gene conversion [1], [10]. This increase or decrease is determined by recombination rates and by the conversion bias, which has been shown in many eukaryotes to be in favor of GC nucleotides compared to AT nucleotides [1], [9]. A recent study that showed that in Escherichia coli regions of low recombination tend to be more AT rich demonstrates that BGC may affect nucleotide content in bacteria in a similar manner [11]. In order to gain insight into mutational biases it is necessary to investigate the results of mutation in isolation from those of selection and BGC. When effective population sizes are small, the efficacy of both natural selection and BGC is severely reduced relative to stochastic processes and therefore sequence evolution is affected strongly by mutational biases. Mutation-accumulation experiments artificially reduce Ne of evolving laboratory cultures [12] and can thus be used to assess mutational biases in culturable bacteria. Similarly, reporter constructs have also been used to estimate mutational biases [13], [14]. However, without knowing the relative amount of time bacteria spend in different growth phases (logarithmic vs. stationary) and given that mutational rates and patterns vary between growth phases [15], [16] it could be difficult to estimate the true mutational biases operating in nature using such experimental approaches. An additional approach is to examine nucleotide substitutions at sites that are expected not to be subject to selection due to protein functionality, such as pseudogenes [17], or fourfold degenerate sites [18]. This approach is also problematic because while pseudogenes and fourfold degenerate sites are expected to be under no, or low selection for protein functionality, they should be subject to the same levels of selection on nucleotide content as the rest of the genome. A good way to estimate mutational biases is to analyze the patterns of single nucleotide polymorphisms (SNPs) within species. Population genetic studies have shown that natural selection and other selection-like processes are less efficient in affecting patterns of nucleotide polymorphisms among very closely related strains compared to nucleotide differences between distantly related strains or species [19], [20]. Thus SNPs should better reflect the mutational patterns compared to substitutions between species. The analysis of SNPs has been used to investigate the mutational biases of a number of AT-rich eukaryotic genomes, such as Drosophila [21]–[23]. However, using this methodology in bacteria has been problematic due to a severe blurriness in species boundaries among prokaryotes [24]. As a result of such blurriness quite often strains sharing a “species” name (such as E. coli) are in fact quite diverged. While it is difficult to define species in bacteria, severely reduced selection has been observed among very closely related bacterial strains of some strictly clonal pathogens, such as strains belonging to the Mycobacterium tuberculosis cluster (MTBC) [25]. This can be explained by the fact that strains belonging to such lineages of pathogens are extremely closely related and thus these lineages may be good proxies of “species” and nucleotide differences between them can be viewed as polymorphisms. In addition, the lifestyle and clonality of such pathogens is likely to lead to small Ne, further reducing efficacy of natural selection [25]. The patterns of SNPs among closely related strains of such clonal pathogens should thus reflect directly the predominant mutational biases. Here we estimate mutational biases by analyzing SNPs extracted from large sequence datasets of five lineages of clonal pathogens (including MTBC) from four broad clades of bacteria that span virtually the whole bacterial phylogeny and the range of bacterial nucleotide contents (Figure 1A): Bacillales (AT rich), Enterobacteriales (intermediate GC), Actinobacteria (high GC), and Burkholderiales (high GC). We find that in all lineages mutation is biased towards AT, and that G/C to A/T transitions are always predominant. Previous studies indicate that mutation may be universally AT-biased in eukaryotes [21], [26]–[31]. Our results together with additional studies that have focused on Enterobacteriales (E. coli, Shigella, and Salmonella typhimurium) [32], [33] demonstrate that mutation may be universally AT-biased in bacteria as well. These findings contradict the long-held view that mutational biases are the main contributors to variation in bacterial nucleotide content and are therefore highly variable among bacteria. Rather they suggest that nucleotide content in bacteria is strongly affected by variation in the relative rates of fixation of AT to GC and GC to AT mutations and that mutational biases are far less variable than previously thought. Results Sequence data from lineages of clonal pathogens are extremely suited to the study of bacterial mutational biases We focused on five lineages of clonal pathogens from four diverse bacterial clades (Table 1). The five lineages we investigated are unique in their suitability for this type of analysis because they provide us with sufficient amounts of available sequence data for sufficiently closely related strains in which we can demonstrate a genome-wide relaxation in the efficacy of natural selection. The chosen strains are indeed very closely related with each lineage exhibiting less than 0.5 pairwise differences per gene (Table 1). However, because of the availability of multiple whole genome sequences in each lineage the total number of SNPs is substantial (Table 1), ranging from 165 to 1877. In addition, these lineages are thought to be clonal [34]. Thus, horizontal gene transfer should occur only rarely, if at all and should not strongly influence our ability to infer the ancestral and derived states of mutations. Finally, the inference of SNPs from such closely related sequences is almost trivial, as alignment programs do much better when sequences are highly similar. Therefore, we expect to have no biases introduced through misalignment of sequences. 10.1371/journal.pgen.1001115.t001 Table 1 Clonal pathogen lineages analyzed in study. Clonal pathogen Source of SNPs Outgroup Max pairwise variability a dN/dS #SNPs Intergenic Non-synonymous Synonymous Bacillus anthracis Alignments of 18 fully and partially sequenced strains B. thuringiensis 0.3 0.58 322 239 112 Salmonella typhi SNPs provided by Holt et al. [52] based on the sequencing of 19 strains None (phylogenetic tree) 0.4 0.45 260 961 656 Yersinia pestis Alignments of 7 fully sequenced strains Y. pseudotuberculosis 0.2 0.64 118 345 162 Burkholderia mallei Alignments of 11 fully and partially sequenced strains B. thailandensis 0.1 0.47 44 70 51 MTBC Alignment of 89 genes sequenced in 107 strains M. canettii 0.5 0.59 NA 226 136 a The average pairwise diversity per gene of the two most diverged strains within the lineage. We assessed whether the patterns of SNPs in these data are indeed weakly affected by natural selection by estimating the ratio of non-synonymous and synonymous differences per non-synonymous and synonymous site (dN/dS) [35], [36] across all alignable proteins within each dataset (Materials and Methods). If selection is strong, dN/dS should be much smaller than 1 as it would efficiently remove most non-synonymous mutations [35], [36]. For example, comparisons of E. coli strains yields dN/dS values of approximately 0.05 [37]. In contrast, for MTBC where multiple lines of evidence suggest that natural selection is severely reduced [25], dN/dS goes up to 0.59. In our dataset, dN/dS values for the other four lineages range between 0.45 and 0.64 (Table 1, Materials and Methods). This suggests that selection is indeed relaxed in a genome-wide manner in these genomes and thus that the pattern of SNPs should be reflective of the mutational biases in these lineages. Relaxed selection over long evolutionary timescales can lead to extreme genome reduction and the loss of many repair pathways, which could affect mutational patterns [1], [38]. The pathogens we use in this study have suffered only a short-term relaxation in the efficiency of selection and there is no indication that any of them have lost repair functions. To further substantiate that these pathogens are not likely to have suffered loss of repair functions we examined whether any of the repair genes annotated in close relatives of the examined pathogens that are not evolving under inefficient selection, have been lost in the pathogens. We found that in B. anthracis, B. mallei, S. typhi and Y. pestis there has been no loss of repair genes and that in all cases the repair genes are highly similar to those found in the outgroups (Materials and Methods, Table S1). In the case of MTBC, since the closely related outgroup strain M. canettii is not fully sequenced we could only compare the genes present in fully sequenced strains of MTBC to those present in a more distantly related outgroup, M. marinum. All but one of the repair genes found in M. marinum are also found in MTBC (Table S1). The M. marinum gene that is not found in MTBC is an un-named gene of unclear function. These results together with a lack of previous evidence for loss of repair functions in these well studied pathogens makes it unlikely that these pathogens have lost repair functions. It is even less likely that all of them suffered similar losses of repair functions. Mutation is AT-biased independently of the current nucleotide content of bacterial clades We polarized the SNPs (Materials and Methods) and classified changes into six possible types of mutations (G/C to A/T, G/C to T/A, G/C to C/G, A/T to G/C, A/T to C/G, and A/T to T/A). The relative rate of each of the six mutation types was calculated after normalizing for the current GC content at the studied positions (Materials and Methods). For all five lineages, irrespective of current genomewide nucleotide content, the predominant mutation is G/C to A/T transition (Figure 2). 10.1371/journal.pgen.1001115.g002 Figure 2 Relative rates of the six nucleotide pair mutations. The most common mutation is always G/C to A/T transitions. The rates are normalized for the unequal nucleotide content of the five different lineages (Materials and Methods). (A) non-synonymous SNPs. (B) synonymous SNPs. It is important to remember that relaxation of selection in the studied lineages is fairly recent and that nucleotide content is a slowly evolving trait. Therefore, if driven by selection or BGC, nucleotide content should not have had time to reach a new mutational equilibrium. However, if nucleotide content is driven predominantly by mutation, and selection and BGC do not strongly affect nucleotide content the genomic nucleotide contents should already be at the mutational equilibrium. We test whether nucleotide content is at equilibrium by comparing the number of GC→AT and AT→GC changes observed in each dataset. Under equilibrium these numbers will be equal. The results of such comparisons (Table 2) clearly show that the lineages with intermediate (Salmonella typhi & Yersinia pestis) and high (Burkholderia mallei & MTBC) nucleotide contents are currently far from equilibrium and that GC→AT changes are much more frequent. These results are statistically significant for all but the intergenic dataset of B. mallei, in which a small number of SNPs leads to very low statistical power (Table 2). 10.1371/journal.pgen.1001115.t002 Table 2 Nucleotide contents of clonal pathogens with intermediate and high GC contents are far from equilibrium. Clonal pathogen Current GC content Sites #GC→AT a #AT→GC a MTBC High Synonymous 103 (84, 123) 18 (10, 26) MTBC High Non- synonymous 127 (105, 151) 58 (43, 73) B. mallei High Synonymous 45 (33, 59) 2 (0, 5) B. mallei High Non-synonymous 57 (42, 73) 7 (2, 12) B. mallei High Intergenic 27 (17, 38) 12 (6, 19) Y. pestis Intermediate Synonymous 116 (95, 138) 37 (26, 48) Y. pestis Intermediate Non-synonymous 230 (201, 259) 79 (63, 97) Y. pestis Intermediate Intergenic 69 (52, 85) 36 (25, 47) B. anthracis Low Synonymous 44 (32, 57) 64 (49, 79) B. anthracis Low Non-synonymous 136 (114, 158) 75 (58, 93) B. anthracis Low Intergenic 141 (119, 166) 151 (128, 175) S. typhi Intermediate Synonymous 570 (524, 619) 79 (62, 97) S. typhi Intermediate Non-synonymous 707 (654, 760) 220 (192, 248) S. typhi Intermediate Intergenic 189 (164, 215) 65 (50, 81) a 95% Confidence intervals appear in parenthesis. Numbers appear in bold if there is a statistically significant (p 75% GC and >75% AT, this assumption implies that point mutation biases are extremely variable among bacteria. Our results demonstrate that mutational biases are in fact very similar across bacteria. Mutation appears to be dominated by C/G to T/A transitions and is AT-biased in every studied instance, even in bacteria with high genomic GC contents. At the same time, it is important to note that mutational biases need not be entirely constant in different bacteria. In fact, our results demonstrate that in the Actinobacteria mutation appears to be less strongly biased towards AT than in the other clades examined. Due to a severe relaxation in natural selection, the recent evolution of the clonal pathogens investigated in this study should be predominantly driven by mutation. However, it is important to note that natural selection is not entirely absent in these pathogens and that mutations of severe deleterious effect will still be removed by selection. In MTBC we have evidence that selection is relaxed to the point that it does not distinguish between mutations that alter amino acids that are conserved across all Mycobacteria and are presumed to be under strong constraint and those that are variable in Mycobacteria and are likely to be under much weaker constraint [25], indicating that only extremely strongly deleterious mutations are being purged by selection. Yet it is still possible that our estimates of mutational biases are somewhat affected by the residual effects of natural selection on strongly deleterious mutations. If such residual effects of selection affect our results they are likely to make our estimates of the extent of the AT bias somewhat conservative. This is demonstrated by our finding that within a given clade a stronger bias towards AT is observed in clonal pathogens where selection is severely relaxed compared to closely related lineages subject to more efficient selection. This being said, our observation that within a broad clade obligate intracellular bacteria evolve to a GC content that matches very closely the one predicted based on our estimates of mutation suggests that our estimates of the extent of AT bias are quite reliable. In an additional study, which is published back-to-back with our study in this issue of PLoS Genetics, Hildebrand et al. investigated mutational patterns by examining changes in fourfold degenerate codons in a large number of bacterial lineages with divergence under 10% at these sites [41] and argue for the AT bias of mutation in GC rich bacteria. The argument of Hildebrand et al. is complicated by two factors. First, natural selection in such lineages can remain strong as exemplified by E. coli [37] and B. pseudomallei (studied here) and much stronger than in the five clonal pathogen strains studied here. Second, the inference of mutational patterns from fourfold degenerate sites alone is complicated by a strong correlation of the GC content of selectively favored codons at the level of translation and the GC content of genomes [42], making it possible that some of the detected effects are related to natural selection at the level of translation. Both of these factors imply that the mutational patterns inferred by Hildebrand et al. are much less precise than those presented here. Nevertheless, the totality of the evidence in Hildebrand et al. is consistent with the generality of AT-biased mutation across bacteria, especially given their focus on a much larger number of bacterial lineages than presented here. A prolonged severe relaxation of selection in obligate intracellular bacteria has led to massive loss of repair genes in these bacteria [38]. In the clonal pathogens studied here relaxation of selection occurred much more recently and there is no evidence for loss of repair functions in these genomes. In fact, we show that none of these pathogens have lost any of the repair genes encoded by their closely related outgroups that are evolving under efficient selection. Additionally, a previous study that investigated the pattern of mutation in strains of S. typhimurium with deficient repair functions has shown that in such strains transversions become much more frequent than transitions [33]. In contrast, in all of the five pathogens we examined mutation was strongly biased towards transitions, rather than transversions. We cannot show conclusively that no repair functions have been lost in any of these five pathogens. However, we believe that the lack of evidence for any such loss, the consistency of the mutational biases observed across all five datasets examined, and the bias towards transitions rather than transversions, make it reasonable to assume that our results are unaffected by a significant loss in repair functions. We show that within broad clades obligate intracellular bacteria evolve to a GC content that matches very closely the GC content predicted at equilibrium based on our estimates of mutation rates of clonal pathogens belonging to the same clade. This together with the long-standing observation that the vast majority of obligate intracellular bacteria tend to have extremely AT rich genomes [38] supports the generality of AT-biased mutation in bacteria. It is however important to note the recent identification of an obligate intracellular bacterium, Candidatus Hodgkinia cicadicola that has a GC-rich genome (58.4% GC) [43]. It is possible that this bacterium constitutes an exception to the universality of AT-biased mutation and that mutation in this organism is GC biased. However, it is also possible that this is not the case and that this bacterium is GC-rich for other reasons (such as due to natural selection or BGC). Further studies will be needed to determine whether Candidatus Hodgkinia cicadicola indeed has exceptional mutational biases. By demonstrating that variation in nucleotide content in bacteria is not generally driven by differences in mutational biases we demonstrate that natural selection or another selection-like process such as BGC must play a dominant role in nucleotide content variation, particularly in driving intermediate to high GC content in many bacterial lineages. At this point it is unclear how much of this variation is driven by selection and how much by BGC. If natural selection plays a strong role in determining GC content it suggests that in many bacteria there are no truly neutrally evolving sites. The nature of such selection remains obscure. Because GC content correlates strongly across coding and noncoding sites genome-wide, natural selection acting on GC content probably relates to genome-wide functions such as replication or DNA maintenance and is less likely to be related to gene expression. Previous studies attempted to associate GC content with environmental factors such as growth temperature [44], [45], exposure to UV [8], oxygen requirements [46], and the ability to fix nitrogen [47]. While these studies were thought by some to be inconclusive [3], [44], [48]–[51] they provide the best current explanations for the possible involvement of selection in determining nucleotide content. However, considering that bacteria belonging to such broad clades as Actinobacteria have similar genomic nucleotide contents even though they are exposed to different environments it becomes tempting to speculate that environmental variables may not be the only underlying determinants for the natural selection acting on nucleotide content. It is possible that selection on nucleotide content is also driven by more intra-organismal factors that can affect entire clades irrespective of environment. Examples of such factors can be the ability of the replication machinery to work better on GC or AT rich sequences, DNA packaging, defense against phages or creating barriers for horizontal gene transfer. More studies need to be carried out to probe the possible involvement of selection in determining bacterial nucleotide content. In order for an increase in BGC to explain GC richness, recombination should be pervasive enough in GC rich bacteria to drive GC contents that are elevated substantially above those observed even in sexually reproducing eukaryotes. It is very likely that per generation advantage given to GC nucleotides through gene conversion (which is determined by recombination rates and by the conversion bias [1], [10]) is significantly higher in sexually reproducing eukaryotes than in prokaryotes for which recombination is assumed to be less frequent [1]. However, it is possible that BGC may still affect some bacteria more strongly than eukaryotes, if Ne is increased in these bacteria by a higher factor than the factor by which the advantage given to GC nucleotides through gene conversion is decreased. This is an intriguing possibility, opening up new avenues of research into recombination rates and variability in Ne among bacteria. It is possible to use the mutational rates we calculated to estimate the strength with which selection, and/or BGC were acting on nucleotide content in the examined clonal pathogens prior to their recent relaxations of selection. Such calculations (Table S4) show that such selection is always weak (s≤1/Ne), which would be expected considering GC content is always intermediate. This demonstrates that the selective or BGC advantage of GC over AT nucleotides need not be high in order to explain the vast variation in GC contents observed in bacteria. However, it is important to note that such calculations make a number of assumptions that may not be reasonable. One such assumption is that selection acts uniformly across sites and that there is no synergistic epistasis (i.e. that the intensity of selection does not change with changes in nucleotide content). An additional assumption is that there are no competing selective forces. This second assumption is clearly incorrect when it comes to non-synonymous sites. Even if selection on nucleotide content in these sites were strong enough to drive them to use only GC nucleotides it is highly likely that selection for protein function would not allow this to happen. It is therefore unclear whether selection on nucleotide content is in fact always weak. In this study we demonstrate the great utility of clonal pathogens for the study of mutational biases in bacteria. We investigated five such clonal pathogens from four very diverse clades. We showed that in every studied case and across all site categories point mutation is consistently biased towards AT and that the most frequent mutations are always G/C→A/T transitions, thus demonstrating that the biases of point mutation are much less variable than was previously assumed. By identifying additional bacteria evolving under strongly relaxed selection and conducting deep sequencing of such bacteria it should be possible to address additional questions regarding mutational patterns of bacteria, including the variability in the rates of insertions and deletions across bacteria and mutation clustering along bacterial chromosomes. Recent studies have shown that mutation appears to be universally AT-biased in eukaryotes [21], [26]–[31]. Our results demonstrating that this may also be the case in prokaryotes therefore show that mutation may in fact be AT-biased in all living organisms (although it is important to note that we do not yet have good estimates of the mutational biases of Archea). Not only is mutation AT-biased in all instances studied, but the specific pattern of mutation is always consistent. The most common mutations are always G/C to A/T transitions. These results make it tempting to speculate that the predominant mutations are simply the result of the lability of cytosine to deamination, and that this pattern shows through despite possible variability in DNA replication and repair mechanisms [27]. Concluding remarks In this study we used data from five strictly clonal pathogens to analyze the variation in point mutation biases in bacteria. These pathogens are uniquely suitable for such analyses as they can be shown to be evolving under selection that is severely inefficient relative to stochastic processes. Unlike obligate intracellular bacteria that have been evolving under inefficient selection for long evolutionary times and have lost much of their repair pathways these clonal pathogens have experienced only a short-term relaxation in selection efficiency and are likely to have intact repair mechanisms. Their mutational biases should therefore reflect those of their other clade members that are not subject to inefficient selection. We demonstrated that even though these five pathogens belong to four very diverse clades with very different nucleotide contents mutation in all of them is biased towards AT, and that the most frequent mutations are always G/C to A/T transitions. Our results show that variation in nucleotide content in bacteria cannot be explained by variation in mutational biases and that biases in point-mutation appear to be far less diverse among bacteria than was previously assumed. Materials and Methods Data sources Salmonella typhi SNPs were taken from the study of Holt et al [52]. MTBC sequences of 89 genes from 107 MTBC strains and one outgroup strain (M. canettii) were taken from our previous study [25]. 18 fully and partially sequenced genomes of B. anthracis, 11 of B. mallei and 10 of B. pseudomallei were taken from the Pathema database [53], together with the completed sequences of the outgroup strains Bacillus thuringiensis and Burkholderia thailandensis. Seven fully sequenced strains of Y. pestis, four fully sequenced strains of Y. pseudotuberculosis and the fully sequenced outgroup strain Y. enterocolitica were downloaded from the NCBI FTP server (ftp.ncbi.nih.gov). SNP extraction and polarization Within each dataset one strain was randomly selected to be the reference genome. The protein sequences of the reference genome were compared using the FASTA algorithm [54] to the protein sequences of all the other strains within the dataset including the outgroup strain. In such a way orthologs were identified as the best reciprocal hits. To prevent false identification of orthologs conservation of gene order along the chromosome was also required. More specifically if in one genome gene X is adjacent to genes Y and Z along the chromosome, and in another genome gene X′ is adjacent to genes Y′ and Z′ and if the best reciprocal hit of gene X in the other genome is gene X′, gene X′ will be considered the ortholog of gene X only if genes Y′ and Z′ are also gene Y and Z's reciprocal best hits. Multiple sequence alignments (MSAs) were created at the protein level for genes for which orthologs could be identified in all of the strains within the dataset and in the outgroup strain. The MSAs were created using the clustalW alignment program. DNA/codon level MSAs were then created based on the protein level alignments by threading the DNA sequences unto the protein alignment. SNPs were extracted from these MSAs and the identities of the ancestral and derived alleles (polarization) were determined according to the outgroup strain sequence. To prevent false identification of SNPs due to misalignments we excluded SNPs from genes with more than 10 SNPs from further analyses. In a similar manner the intergenic regions of the reference genome were compared at the DNA level to the intergenic regions of all other strains and of the outgroup. Intergenic sequences were considered orthologous if they were reciprocal best hits and if they could be aligned across their entire sequence. MSAs were created at the DNA level for intergenic sequences for which orthologs could be identified in all strains. To prevent false identification of SNPs due to misalignments (a problem that seemed to affect intergenic regions in particular) we excluded SNPs from intergenic sequences with more than 10 SNPs from further analyses. In the cases of B. anthracis and B. mallei this left us with very few SNPs when an outgroup strain was used. Therefore in these two datasets the intergenic SNPs were not polarized using the outgroup. Instead we assumed that the most frequent allele within the SNP is the ancestral allele, while the less frequent one is derived. An outgroup strain was used to polarize the intergenic SNPs in the remaining datasets. Calculating the relative rates of the six nucleotide pair mutations In order to account for the unequal nucleotide content of the five different lineages we normalized the counts of the mutations from A/T to G/C, C/G, or T/A by multiplying them by , where #GCsites and #ATsites are the current genome wide number of GC or AT sites at the considered site category. In this way we determine the expected number of such mutations under equal GC and AT contents. In order to calculate the relative rates of each possible pairwise mutation each of the resulting counts (unaltered in the case of mutations from G/C, and normalized in the case of mutations from A/T) was multiplied by 100 and divided by the sum of these counts. Calculating GCeq From the polarized SNPs calculating the number GC→AT and AT→GC changes (#GC→AT and #AT→GC) is straightforward. The rates of the two types of changes were calculated separately for intergenic, synonymous and non-synonymous sites as: Where #GCsites and #ATsites are the current genome wide number of GC or AT sites at this site category in a randomly selected strain of the considered lineage. In order to calculate the current genome wide number of GC and AT sites for non-synonymous and synonymous sites for each genome we classified sites into synonymous and non-synonymous based on the method suggested by Nei and Gojobori [35]. According to this method sites will be considered entirely synonymous if no changes in them can lead to an amino acid change and will be considered entirely non-synonymous if all changes in them will cause an amino acid change. For sites in which some changes may change the amino acid while others will not the site is considered partially synonymous and partially non-synonymous according to the proportion of the changes that will lead to an amino acid change. We added to the relative GC or AT count of the sites category the proportion of the site which is attributable to that category. For example, if a site is 1/3 synonymous and 2/3 non-synonymous and the current base in this site is a C we added 1/3 of a count to #GCsites of the synonymous sites and 2/3 of a count to #GCsites of the non-synonymous sites. The GC contents we calculated for non-synonymous and synonymous sites were also used to draw Figure 1B. Next, the expected equilibrium GC content based on these mutational rates (GCeq) was calculated as [1], [8]: Calculating confidence intervals of #GC→AT, #AT→GC and GCeq 1000 values were sampled from the Poisson distribution once with a mean of #GC→AT and once with a mean #AT→GC. This was done using the R program, rpois. The resulting values were sorted and used to estimate 95% confidence intervals for #GC→AT and #AT→GC. They were also used to recalculate GCeq 1000 times and the resulting GCeq values were sorted and used to estimate the 95% confidence intervals for GCeq. dN/dS calculations dN/dS calculations were performed using the method of Nei and Gojobori [35]. dN/dS estimates were calculated for the entire genome rather than per gene. If in all considered genes we found ns non-synonymous SNPs and s synonymous SNPs and in the genome there are N nonsynonymous sites and S synonymous sites: Analysis of conservation of repair genes Repair genes were identified based on genome annotation for five close relatives of the five pathogens, for which selection is efficient. Genes annotated as putative or hypothetical were excluded. For Y. pestis the close relative used was Y. pseudotuberculosis. For B. mallei the close relative used was B. pseudomallei. For S. typhi the close relative used was S. typhimurium, for B. anthracis the close relative used was B. thuringiensis. For MTBC this analysis was problematic as M. canettii which was used as an outgroup for MTBC strains in this study is not yet fully sequenced. We were therefore forced to use a more distantly related outgroup, M. marinum, which is far more diverged from MTBC than the other outgroups are from their pathogens. The orthologs of the repair genes from the outgroups were identified in the pathogens and the sequences of the pathogen versions of the genes were compared to those of the outgroups. The results of this analysis are summarized in Table S1. Supporting Information Figure S1 GC contents of obligatory intra-cellular bacteria tend to be lower than those of other members of the same broad clades. (3.00 MB TIF) Click here for additional data file. Table S1 Conservation of repair proteins in examined pathogens. (0.09 MB DOC) Click here for additional data file. Table S2 Summary of results for five low-diversity sub-clades, with singletons removed. (0.03 MB DOC) Click here for additional data file. Table S3 GC contents of obligate intracellular and non-obligate intracellular bacteria by cladea. (0.07 MB XLS) Click here for additional data file. Table S4 Strength of selection or selection like processes for GC over AT nucleotides assuming constant selection and no competing selective pressures. (0.04 MB DOC) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The rules and impact of nonsense-mediated mRNA decay in human cancers

              Premature termination codons (PTCs) cause a large proportion of inherited human genetic diseases. PTC-containing transcripts can be degraded by an mRNA surveillance pathway termed nonsense-mediated mRNA decay (NMD). However, the efficiency of NMD varies; it is inefficient when a PTC is located downstream of the last exon junction complex (EJC). We used matched exome and transcriptome data from 9,769 human tumors to systematically elucidate the rules of NMD targeting in human cells. An integrated model incorporating multiple rules beyond the canonical EJC model explains approximately three-quarters of the non-random variance in NMD efficiency across thousands of PTCs. We also show that dosage compensation may mask the effects of NMD. Applying the NMD model identifies signatures of both positive and negative selection on NMD-triggering mutations in human tumors and provides a classification of tumor suppressor genes.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: MethodologyRole: Project administrationRole: SupervisionRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, CA USA )
                1553-7390
                1553-7404
                17 September 2019
                September 2019
                : 15
                : 9
                : e1008386
                Affiliations
                [001]Milner Centre for Evolution, University of Bath, Bath, United Kingdom
                University of Warwick, UNITED KINGDOM
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0001-7824-8709
                http://orcid.org/0000-0002-1002-1054
                Article
                PGENETICS-D-19-00941
                10.1371/journal.pgen.1008386
                6764699
                31527909
                f3dc9c54-90ad-4b89-b860-fce162e1dc5b
                © 2019 Ho, Hurst

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 11 June 2019
                : 27 August 2019
                Page count
                Figures: 10, Tables: 1, Pages: 32
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/501100000781, European Research Council;
                Award ID: ERC-2014-ADG 669207
                Award Recipient :
                This work was supported by the European Research Council (Grant EvoGenMed ERC-2014-ADG 669207 to L.D.H). For more information regarding ERC activities, please visit https://erc.europa.eu/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Organisms
                Eukaryota
                Biology and Life Sciences
                Genetics
                Gene Expression
                Biology and Life Sciences
                Computational Biology
                Comparative Genomics
                Biology and Life Sciences
                Genetics
                Genomics
                Comparative Genomics
                Biology and life sciences
                Biochemistry
                Nucleic acids
                RNA
                Messenger RNA
                Untranslated Regions
                3' Utr
                Biology and Life Sciences
                Organisms
                Bacteria
                Mollicutes
                Biology and Life Sciences
                Microbiology
                Bacteriology
                Bacterial Genetics
                Bacterial Genomics
                Biology and Life Sciences
                Genetics
                Microbial Genetics
                Bacterial Genetics
                Bacterial Genomics
                Biology and Life Sciences
                Genetics
                Genomics
                Microbial Genomics
                Bacterial Genomics
                Biology and Life Sciences
                Microbiology
                Microbial Genomics
                Bacterial Genomics
                Biology and life sciences
                Biochemistry
                Nucleic acids
                RNA
                Messenger RNA
                Untranslated Regions
                Biology and Life Sciences
                Genetics
                Gene Expression
                Protein Translation
                Custom metadata
                vor-update-to-uncorrected-proof
                2019-09-27
                Scripts can be found at https://github.com/ath32/ASCs. All other relevant data are within the manuscript and its Supporting Information files.

                Genetics
                Genetics

                Comments

                Comment on this article