181
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Global Regulatory Functions of the Staphylococcus aureus Endoribonuclease III in Gene Expression

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          RNA turnover plays an important role in both virulence and adaptation to stress in the Gram-positive human pathogen Staphylococcus aureus. However, the molecular players and mechanisms involved in these processes are poorly understood. Here, we explored the functions of S. aureus endoribonuclease III (RNase III), a member of the ubiquitous family of double-strand-specific endoribonucleases. To define genomic transcripts that are bound and processed by RNase III, we performed deep sequencing on cDNA libraries generated from RNAs that were co-immunoprecipitated with wild-type RNase III or two different cleavage-defective mutant variants in vivo. Several newly identified RNase III targets were validated by independent experimental methods. We identified various classes of structured RNAs as RNase III substrates and demonstrated that this enzyme is involved in the maturation of rRNAs and tRNAs, regulates the turnover of mRNAs and non-coding RNAs, and autoregulates its synthesis by cleaving within the coding region of its own mRNA. Moreover, we identified a positive effect of RNase III on protein synthesis based on novel mechanisms. RNase III–mediated cleavage in the 5′ untranslated region (5′UTR) enhanced the stability and translation of cspA mRNA, which encodes the major cold-shock protein. Furthermore, RNase III cleaved overlapping 5′UTRs of divergently transcribed genes to generate leaderless mRNAs, which constitutes a novel way to co-regulate neighboring genes. In agreement with recent findings, low abundance antisense RNAs covering 44% of the annotated genes were captured by co-immunoprecipitation with RNase III mutant proteins. Thus, in addition to gene regulation, RNase III is associated with RNA quality control of pervasive transcription. Overall, this study illustrates the complexity of post-transcriptional regulation mediated by RNase III.

          Author Summary

          Control of mRNA stability is crucial for bacteria to survive and rapidly adapt to environmental changes and stress conditions. The molecular players and the degradation pathways involved in these adaptive processes are poorly understood in Staphylococcus aureus. The universally conserved double-strand-specific endoribonuclease III (RNase III) in S. aureus is known to repress the synthesis of several virulence factors and was recently implicated in genome-wide mRNA processing mediated by antisense transcripts. We present here the first global map of direct RNase III targets in S. aureus. Deep sequencing was used to identify RNAs associated with epitope-tagged wild-type RNase III and two catalytically impaired but binding-competent mutant proteins in vivo. Experimental validation revealed an unexpected variety of structured RNA transcripts as novel RNase III substrates. In addition to rRNA operon maturation, autoregulation, degradation of structured RNAs, and antisense regulation, we propose novel mechanisms by which RNase III increases mRNA translation. Overall, this study shows that RNase III has a broad function in gene regulation of S. aureus. We can now address more specifically the roles of this universally conserved enzyme in gene regulation in response to stress and during host infection.

          Related collections

          Most cited references72

          • Record: found
          • Abstract: found
          • Article: not found

          Single processing center models for human Dicer and bacterial RNase III.

          Dicer is a multidomain ribonuclease that processes double-stranded RNAs (dsRNAs) to 21 nt small interfering RNAs (siRNAs) during RNA interference, and excises microRNAs from precursor hairpins. Dicer contains two domains related to the bacterial dsRNA-specific endonuclease, RNase III, which is known to function as a homodimer. Based on an X-ray structure of the Aquifex aeolicus RNase III, models of the enzyme interaction with dsRNA, and its cleavage at two composite catalytic centers, have been proposed. We have generated mutations in human Dicer and Escherichia coli RNase III residues implicated in the catalysis, and studied their effect on RNA processing. Our results indicate that both enzymes have only one processing center, containing two RNA cleavage sites and generating products with 2 nt 3' overhangs. Based on these and other data, we propose that Dicer functions through intramolecular dimerization of its two RNase III domains, assisted by the flanking RNA binding domains, PAZ and dsRBD.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures

            Introduction Since the 454 pyrosequencing technology [3] has been introduced to the market, the need for algorithms that efficiently map huge amounts of reads to reference genomes has rapidly increased. Later, high throughput sequencing (HTS) methods such as Illumina [4] and SOLiD (Applied Biosystems) have intensified the demand. The development of read mapping methods decisively depends on specifications and error models of the respective technologies. Unfortunately, little is known about specific error models, and models are likely to change as manufactures are constantly modifying chemistry and machinery. Increasing the read length is a key aim of all vendors — tolerating a trade-off with read accuracy. In a recent investigation on error models of 454 and Illumina technologies, it has been shown that 454 reads are more likely to include insertions and deletions while Illumina reads typically contain mismatches [5],[6]. Currently available read mapping programs are specifically designed to allow for mismatches when aligning the reads to the reference genome. Most of the programs, e.g. MAQ [7], SOAP [8], SHRiMP [9] or ELAND (proprietary), use seeding techniques that gain their speed from pre-computed hash look-up tables. Some of these programs, in particular SOAP and MAQ, are specifically designed to map short Illumina or SOLiD reads. Longer sequences cannot be mapped by these tools. The matching models of MAQ, ZOOM [10], SOAP, SHRiMP, Bowtie [11], and ELAND focus on mismatches and largely neglect insertions and deletions. Indels are only considered during subsequent alignment steps but not while searching for seeds. With indels accounting for more than two thirds of all 454 sequencing errors, this is a major shortcoming for these kinds of reads [5]. Only PatMaN [12] and BWA [13] are able to handle a limited number of indels. Mapping is aggravated by the manufacturers' overestimation of their read accuracies. While an overall error rate of 0.5% has been observed for 454, the error rate increases drastically for reads shorter than 80 bp and longer than 100 bp [5], leading to considerably larger error frequencies in real-life datasets. This implies that, sequencing projects aiming to find short transcripts such as miRNAs lose a substantial fraction of their data, unless a matching strategy is used that takes indels into account. In Illumina reads, error rates of up to 4% have been observed [6]. This differs significantly from Illumina's specification. Compared to 454, the frequency of indels is significantly lower. Moreover, differences between reads and reference genome might also occur due to genomic variations such as SNPs. We present a matching method that uses enhanced suffix arrays to compute exact and inexact seeds. Sufficiently good seeds subsequently trigger a full dynamic programming alignment. Our method is insensitive to errors and contaminations at the ends of a read including 3′ and 5′ primers and tags. The results section describes the basic ideas and an evaluation of our segemehl software implementing our method. The technical details of the matching model are described in the Methods section at the end of this contribution. Results Outline of the Algorithmic Approach A read aligner should deliver the original position of the read in the reference genome. Such a position will be called the true position in the following. Optimally scoring local alignments of the read and the reference genome can be used to obtain a possible true position, but because an alignment of the read with the reference genome at the true position does not always have an optimal score according to the chosen scoring scheme, this method does not always work. Nevertheless, there are no better approaches available unless further information about the read is at hand. We present a new read mapping approach that aims at finding optimally scoring local alignments of a read and the reference genome. It is based on computing inexact seeds of variable length and allows to handle insertions, deletions (indels; gaps), and mismatches. Throughout the document the notion of differences refers to mismatches, insertions and deletions in some local alignment of the read and the reference genome, irrespective of whether they arise from technical artifacts or sequence variation. A single difference is either a single mismatch, a single character insertion or a single character deletion. Although not limited to a specific scoring scheme, we have implemented our seed search model in the program segemehl assigning a score of 1 to each match and a score of −1 to each mismatch, insertion or deletion. Our matching strategy derives from a simple and commonly used idea. Assume an optimally scoring local alignment of a read with the reference genome with exactly two differences. If the positions of the differences in the alignment are sufficiently far apart, we can efficiently locate exact seeds which in turn may deliver the position of the optimal local alignment in the reference genome. Likewise, if the distance between the two differences is small, two continuous exact matches at the ends of the read possibly allow to map the read to this position. To exploit this observation, the presented method employs a heuristic based on searches starting at all positions of the read. That is, for each suffix of the read the longest prefix match, i.e. the longest exact match beginning at the first position of the suffix with all substrings of the reference genome is computed. If the longest prefix match is long enough that it only occurs in a few positions of the reference genome, it may be feasible to check all these positions to verify if the longest prefix match is part of a sufficiently good alignment. While this approach works already well for many cases, we need to increase the sensitivity for cases where the computation of the longest prefix match fails to deliver a match at the position of the optimally scoring local alignment. This is the case when a longer prefix match can be obtained at another position of the reference genome by exactly matching characters that would result in a mismatch, insertion or deletion in the optimal local alignment (cf. Fig. 1). Therefore, during the computation of each longest prefix match we check a limited number of differences by enumerating at certain positions all possible mismatches and indels (cf. Fig. 2). 10.1371/journal.pcbi.1000502.g001 Figure 1 Longest prefix matches may fail to deliver the position of the optimally scoring local alignment. Assume a simple scoring scheme that assigns a score of +1 to a single character match and a score of 0 to a single character mismatch, a single insertions or deletion. Using longest prefix matches bears the risk of ignoring differences in the best, i.e. optimally scoring, local alignment. Its retrieval fails if a longer match can be obtained at another position of the reference sequence by matching a character, that is inserted, deleted, or mismatched in the best local alignment. Depending on the length of the reference genome and its nucleotide composition the probability is determined by the length of the substring that can be matched to the position of the best local alignment before the first difference occurs. (A) The optimally scoring alignment of the read P: = cttcttcggc begins at position 3 of the reference genome S: = atacttcttcggcaga. Let Pi denote the ith suffix of the read P. For each Pi , the starting positions of the longest match in S comprise the position of Pi in the best local alignment (solid green lines). That is, the longest match of P 0 begins at position 3, the longest match of P 1 begins at position 4, the longest match of P 2 begins at position 5 and so forth. (B) For the read P: = cttcgtcggc, the retrieval of the best local alignment fails for all Pi , i j, S[i‥j] denotes the empty string. occS (w) denotes the set of occurrences of some string in S, i.e. the set of positions i, 0≤i≤|S|−|w| satisfying w = S[i‥i+|w|−1]. A substring of S beginning at the first position of S is a prefix of S and a substring ending at the last position of S is a suffix of S. To prevent that suffixes have a second occurrence in S, we add a sentinel character $ (not occurring in S) to the end of S. For each i, 0≤i≤n, Si  = S[i‥n−1]$ denotes the i-th non-empty suffix of S$, i.e. the suffix beginning at position i in S$. We identify a suffix of S$ by its start position. That is, by suffix i we mean Si . The concept of suffix arrays is based on lexicographically sorting the suffixes of S$. Suppose that the characters are ordered such that A 0. First note that ℓ i −1 ≤ℓ i +1. Moreover, for each q, 1≤q≤ℓ i −1 we have where  = {x+y | x∈M} denotes the elementwise addition for any set M. That is, any suffix in can be found in with offset one. To allow differences in our matching heuristic, we introduce the concept of matching branches which branch off from sets of the matching stem. We describe the branching in terms of a transformation of some suffix interval . Let i, 0≤i≤m−1 be arbitrary but fixed. Let q be such that i+q−1
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The distributions, mechanisms, and structures of metabolite-binding riboswitches

              Background Riboswitches are autonomous noncoding RNA elements that monitor the cellular environment and control gene expression [1-4]. More than a dozen classes of riboswitches that respond to changes in the concentrations of specific small molecule ligands ranging from amino acids to coenzymes are currently known. These metabolite-binding riboswitches are classified according to the architectures of their conserved aptamer domains, which fold into complex three-dimensional structures to serve as precise receptors for their target molecules. Riboswitches have been identified in the genomes of archaea, fungi, and plants; but most examples have been found in bacteria. Regulation by riboswitches does not require any macromolecular factors other than an organism's basal gene expression machinery. Metabolite binding to riboswitch aptamers typically causes an allosteric rearrangement in nearby mRNA structures that results in a gene control response. For example, bacterial riboswitches located in the 5' untranslated regions (UTRs) of messenger RNAs can influence the formation of an intrinsic terminator hairpin that prematurely ends transcription or the formation of an RNA structure that blocks ribosome binding. Most riboswitches inhibit the production of unnecessary biosynthetic enzymes or transporters when a compound is already present at sufficient levels. However, some riboswitches activate the expression of salvage or degradation pathways when their target molecules are present in excess. Certain riboswitches also employ more sophisticated mechanisms involving self-cleavage [5], cooperative ligand binding [6], or tandem aptamer arrangements [7]. Many aspects of riboswitch regulation have not yet been critically and quantitatively surveyed. To forward this goal, we have compiled a comparative genomics data set from systematic database searches for representatives of ten metabolite-binding riboswitch classes (Table 1). The results define the overall taxonomic distributions of each riboswitch class and outline trends in the mechanisms of riboswitch-mediated gene control preferred by different bacterial groups. The expanded riboswitch sequence alignments resulting from these searches include newly identified variants that provide valuable information about their conserved aptamer structures. Using this information, we have re-evaluated the consensus secondary structure models of these ten riboswitch classes. The updated structures reveal that certain riboswitch aptamers utilize previously unrecognized examples of common RNA structure motifs as components of their conserved architectures. They also highlight new base-base interactions predicted with a procedure that estimates the statistical significance of mutual information scores between alignment columns. Table 1 Sources of riboswitch sequence alignments and molecular structures References Riboswitch class Rfam accession Seed alignment Other alignments Molecular structures Thiamine pyrophosphate (TPP) RF00059 [41] [48] [71-73] Adenosylcobalamin (AdoCbl) RF00174 [39] [20] Lysine RF00168 [37] [21] Glycine RF00504 [6] S-Adenosylmethionine class 1 (SAM-I) RF00162 [94] [9,52] [78] Flavin mononucleotide (FMN) RF00050 [56] Guanine and adenine (purine) RF00167 [22] [95-97] Glucosamine-6-phosphate (GlcN6P) RF00234 [23] [28,30] 7-Aminoethyl 7-deazaguanine (preQ1) RF00522 [40] S-Adenosylmethionine class 2 (SAM-II) RF00521 [18] Riboswitches are named for the metabolite that they sense with standard abbreviations in parentheses. Rfam database numbers are provided for each riboswitch along with references to the seed alignments we used to train covariance models for database searches in this study, other published multiple sequence alignments, and three-dimensional molecular structures. Results and discussion Riboswitch identification overview Metabolite-binding riboswitch aptamers are typical of complex functional RNAs that must adopt precise three-dimensional shapes to perform their molecular functions. A conserved scaffold of base-paired helices organizes the overall fold of each aptamer. The identities of bases within most helices vary during evolution, but changes usually preserve base pairing to maintain the same architecture. In contrast, the base identities of nucleotides that directly contact the target molecule or stabilize tertiary interactions necessary to assemble a precise binding pocket are highly conserved even in distantly related organisms. Additionally, many riboswitches tolerate long nonconserved insertions at specific sites within their structures. These 'variable insertions' typically adopt stable RNA stem-loops that do not interfere with folding of the aptamer core. Nearly all of the riboswitches discovered to date are cis-regulatory elements. For example, bacterial riboswitches are almost always located upstream of protein-coding genes related to the metabolism of their target molecules. Therefore, the genomic contexts of putative hits returned by an RNA homology search can be used to recognize legitimate riboswitches even when a search algorithm returns many false positives. Using this tactic, one can iteratively refine the description of a riboswitch aptamer by incorporating authentic low scoring hits into a new structure model and then re-searching the sequence database. Several riboswitches were first identified as widespread RNA elements based on the presence of a highly conserved 'box' sequence within their structures. BLAST searches for the B12 box [8], S box [9], and THI box [10] sequences are effective for discovering many examples of the adenosylcobalamin (AdoCbl), S-adenosylmethionine (SAM)-I, and thiamin pyrophosphate (TPP) riboswitches, respectively. Other search techniques score how well a sequence matches a template of conserved bases and base-paired helices that the user manually devises from known examples of the riboswitch aptamer. The RNAmotif program performs this sort of generalized pattern matching [11]. A third strategy computationally defines and then searches for ungapped blocks of sequence conservation that are characteristic of a given riboswitch and spaced throughout its structure [12]. While these methods can be effective, they generally do not fully exploit the information contained in multiple sequence alignments of functional RNA families to efficiently identify highly diverged members. Covariance models (CMs) are generalized probabilistic descriptions of RNA structures that offer several advantages over other homology search methods [13]. CMs can be directly trained on an input sequence alignment without time-consuming manual intervention. They also provide a more complete model of the sequence and structure conservation observed in functional RNA families that incorporates: first-order sequence consensus information; second-order covariation, where the probability of observing a base in one alignment column depends on the identity of the base in another column; insert states that allow variable-length insertions; and deletion states that allow omission of consensus nucleotides. This complexity comes at a computational cost, but several filtering techniques have recently been developed that make CM searches of large databases practical [14-16]. For example, CMs have been used to find divergent homologs of Escherichia coli 6S RNA [17] and define a variety of regulatory RNA motifs in α-proteobacteria [18]. The Rfam database [19] maintains hundreds of covariance models for identifying a wide variety of functional RNAs, including riboswitches. In the present study, we used covariance models to systematically search for ten classes of metabolite-binding riboswitches in microbial genomes, environmental sequences, and selected eukaryotic organisms. The riboswitch sequence alignments used to train these CMs were derived from a variety of published and unpublished sources (Table 1). The genomic contexts of prospective riboswitch hits were examined to confirm that each was appropriately positioned to function as a regulatory element. In general, CMs trained on the input alignments were able to discriminate valid riboswitch sequences from false positive hits on the basis of CM scores alone. The most common exceptions were spuriously high-scoring AU-rich matches to the smaller riboswitch models (for example, the purine riboswitch) and bona fide low-scoring hits with variable insertions at unusual positions in the more structurally complex riboswitch classes. Prospective riboswitch matches were also examined to ensure that they conformed to known aptamer structure constraints. In certain cases, it was necessary to manually correct portions of the automated sequence alignments defined by the maximally scoring path of each hit through the states of the CM. For example, CMs model only hierarchically nested base pairs for algorithmic speed [13]. Consequently, the pseudoknotted helices and pairings present in several riboswitches were aligned by hand to achieve the desired accuracy. The automated CM alignments also tend to incorrectly shift nucleotides when deletions of consensus positions result in ambiguity concerning the optimal placement of remaining sequences. The alignments of new RNA structure motifs and base-base interactions described later that were not present in the seed alignments used to train the covariance models were also manually adjusted. Multiple sequence alignments of the resulting curated riboswitch hits are available as Additional data files 1 and 2. Riboswitch distributions The phylogenetic distributions of the ten riboswitch classes were mapped from these search results (Figure 1). Members of the TPP riboswitch class are the only metabolite-binding RNAs known to occur outside of eubacteria. TPP riboswitch representatives are found in euryarchaeal, fungal, and plant species. The AdoCbl riboswitch is the most widespread class in bacteria, but TPP, flavin mononucleotide (FMN), and SAM-I riboswitches are also common in many groups. Glycine and lysine riboswitches have more fragmented distributions. They are widespread in certain bacterial groups, but appear to be missing from others. Finally, the glucosamine-6-phosphate (GlcN6P), purine, 7-aminoethyl 7-deazaguanine (preQ1), and SAM-II riboswitches were identified in only a few groups of bacteria. Interestingly, the SAM-I and SAM-II aptamer distributions overlap slightly. Examples of both SAM-sensing riboswitch classes were found in α-Proteobacteria, γ-Proteobacteria, and Bacteroidetes, but no single bacterial species was found to carry both SAM-I and SAM-II riboswitch classes. Figure 1 Riboswitch distributions. The dimensions of each square are proportional to the frequency with which a given riboswitch occurs in the corresponding taxonomic group. A phylogenetic tree with the standard accepted branching order for each group of organisms is shown on the left. For bacteria, this tree is adapted from [92] with the addition of Fusobacteria [93]. On the right is a graph depicting the total number of nucleotides from each taxonomic division in the sequence databases that were searched. It is possible that many of the relatively isolated examples where riboswitches occur only sporadically in certain clades (for example, SAM-I, SAM-II, purine, and preQ1 in γ-Proteobacteria) may be examples of horizontal DNA transfer. There is some evidence that this process has been important for the dispersal of riboswitches into new bacterial genomes. Entire transcriptional units containing AdoCbl riboswitches and their associated biosynthetic operons appear to have been transferred from Bacillus/Clostridium species to enterobacteria at some point [20]. In contrast, no evidence of recent horizontal transfer was observed in phylogenetic trees of lysine riboswitch aptamers, despite their disjointed distribution across different taxonomic groups [21]. Firmicutes (low G+C Gram-positive bacteria) appear to make the most extensive use of the riboswitch classes examined in this study. Every riboswitch except SAM-II is widespread in this clade, and most aptamer classes occur multiple times per genome. For example, Bacillus subtilis carries at least 29 riboswitches (5 TPP, 1 AdoCbl, 2 FMN, 1 glycine, 11 SAM-I, 2 lysine, 1 GlcN6P, 4 guanine, 1 adenine, and 1 preQ1) controlling approximately 73 genes. Experimental and computational efforts to identify riboswitches have been focused specifically on B. subtilis [22,23], so it is possible that the overrepresentation of these ten riboswitch classes in Firmicutes reflects a discovery bias. Indeed, new computational searches are beginning to identify riboswitch classes that are predominantly used by other groups of bacteria [18,24]. As a whole, γ-Proteobacteria employ a mixture of these ten riboswitch classes that is comparable to the diversity found in Firmicute species. However, individual species usually carry fewer riboswitch classes overall and fewer representatives of each class. For example, E. coli has six riboswitches (three TPP, one AdoCbl, one FMN, and one lysine) from the ten classes examined, which regulate a total of sixteen genes. Deeply branched bacteria such as Deinococcus/Thermus and Thermotoga species also appear to utilize a variety of riboswitches. However, no riboswitch sequences have yet been identified in Aquifex species, and riboswitches also seem to occur only rarely in Chlamydia species, Cyanobacteria, and Spirochetes. However, the sequence database sizes for many of these bacterial groups are relatively small so the observed frequencies will probably need to be revised as more genomic sequences become available. As expected, representatives of almost all ten riboswitch classes are found in sequences from shotgun cloning projects that target environments supporting diverse bacterial communities. These sources of additional sequences have been helpful in some cases for defining consensus structure models and adding statistical merit to mutual information calculations (see below). It is notable that glycine and SAM-II riboswitches are unusually common in Sargasso Sea metagenomic sequences [25]. This data set appears to be contaminated with some non-native Shewanella and Burkholderia sequences [26], but the large number of SAM-II matches probably accurately reflects the abundance of α-Proteobacteria in this environment. Riboswitch mechanism overview GlcN6P riboswitches are ribozymes that harness a self-cleavage event to repress expression of downstream glmS genes [5]. Members of this class are unique compared to other riboswitches because they adopt a preformed binding pocket for glucosamine-6-phosphate [27,28] and use the metabolite target as a cofactor to accelerate RNA cleavage [28-30]. The nine other riboswitch classes studied here utilize ligand-induced changes in 'expression platform' sequences to control a variety of gene expression processes [1]. The architectures of riboswitch expression platforms can be used to predict their gene control mechanisms on a genomic scale, as described below. Riboswitches typically contain disordered regions in their conserved aptamer cores that become structured upon metabolite binding. These changes may trigger rearrangements in additional expression platform structures located outside of the aptamer, such that two alternative conformations with mutually exclusive base-paired architectures exist for the entire riboswitch. Some riboswitches operate at thermodynamic equilibrium [31]. They are able to interconvert between these ligand-bound and ligand-free structures in the context of the full-length RNA. Regulation by other riboswitches is kinetically controlled [32-35]. The relative speeds of transcription and co-transcriptional ligand binding dominate a one-time decision as to which folding pathway to follow. The active and inactive conformations of these riboswitches are trapped in the final RNA molecule and do not readily interconvert on a time scale that is relevant to the gene control system. In most riboswitches, bases from the aptamer's outermost P1 'switching' helix, which is enforced in the ligand-bound conformation, pair to expression platform sequences to form an alternative structure in the absence of ligand, for example, [36,37]. However, some riboswitches harness shape changes elsewhere in their aptamers to regulate gene expression. AdoCbl riboswitches usually rely on the ligand-dependent formation of a pseudoknot between a specific C-rich loop and sequences outside the aptamer core to exert gene control [20,38,39]. SAM-II aptamers enforce a distal pseudoknot to interface with their expression platforms [18], and preQ1 riboswitches sequester conserved 3' tail sequences upon metabolite binding [40]. Riboswitches can use ligand-induced structure changes to control gene expression in a variety of contexts. For example, the TPP riboswitches found in eukaryotes reside in introns located near the 5' ends of fungal pre-mRNAs [41-43] or in the 3' UTRs of plant pre-mRNAs [41]. Ligand binding modulates splicing of these introns, generating alternative-processed mRNAs that are expressed at different levels. In each example studied, a portion of the P4-P5 stem region pairs near a 5' splice-site, and this pairing is displaced when TPP is bound [43] (A Wachter, M Tunc-Ozdemir, BC Grove, PJ Green, DK Shintani, RRB, unpublished data). In contrast, almost all bacterial riboswitches occur in the 5' UTRs of mRNAs. Metabolite binding to these riboswitches generally regulates either transcription or translation of the encoded genes. Bacterial riboswitches that regulate transcription usually control the formation of intrinsic terminator stems located within the same 5' UTR. Intrinsic terminators are stable GC-rich stem-loops followed by polyuridine tracts that cause RNA polymerase to stall and release the nascent RNA with some probability [44,45]. Certain glycine [6] adenine [46], and lysine [21] riboswitches with ON genetic logic use structural rearrangements triggered by metabolite binding to bury pieces of terminator stems in alternative pairing interactions. However, most riboswitches controlling transcription are OFF switches that add an extra folding element to reverse this logic. Metabolite binding to these riboswitches disrupts an antiterminator, which normally sequesters bases required to form the terminator stem, allowing the terminator to form and repress gene expression. Similar antiterminator/terminator trade-offs occur in bacterial RNAs regulated by protein- or ribosome-mediated transcription attenuation mechanisms [47]. Bacterial riboswitches that regulate translation typically use ligand-induced structure changes to block translation initiation. Unlike riboswitches with transcription control mechanisms, which require very specific terminator structures in their expression platforms, the RNA structures that prevent translation initiation may be more varied. Sometimes, they rely on simple hairpins that sequester the ribosome binding site (RBS) of the downstream gene in a base-paired helix. In these cases, a riboswitch with OFF genetic logic can harness metabolite binding to disrupt a mutually exclusive antisequestor pairing, allowing the sequestor hairpin to form and attenuate translation. More convoluted base-pairing trade-offs and shape changes may operate in other expression platforms to alter the efficiency of translation initiation in response to ligand binding. Two variants of these mechanisms that dispense with or combine the elements of a typical bacterial riboswitch expression platform are worth noting. Some riboswitches bury the RBS of the downstream gene within their conserved aptamer cores [48,49]. Thus, ligand binding directly attenuates translation without the involvement of any additional expression platform sequences. Other riboswitches regulate the formation of a transcription terminator located so close to the adjacent open reading frame that its RBS resides within the 3' side of the terminator hairpin [48]. Riboswitches with these dual expression platforms could attenuate transcription and, if termination does not occur, could also inhibit translation. Metabolite-dependent inhibition of ribosome binding has been proven in vitro for the E. coli AdoCbl riboswitch located upstream of the btuB gene [50]. In addition, in vivo expression assays using translational fusions between AdoCbl riboswitches and reporter genes indicate that control of translation is occurring [38]. However, other co- or post-transcription mechanisms might also contribute to the observed gene expression changes. For example, AdoCbl riboswitches from E. coli and B. subtilis can be cleaved by RNase P [51]. Such findings raise the interesting possibility that differential RNA processing or degradation caused by ligand-induced conformational changes might be the primary mechanism by which some riboswitches regulate gene expression. There is one interesting instance where a Clostridium acetobutylicum SAM-I riboswitch appears to regulate protein expression through an antisense RNA intermediate [52]. This riboswitch is located immediately downstream, and in the opposite orientation from, an operon encoding a putative salvage pathway for converting methionine to cysteine. It has an expression platform, consisting of a typical terminator/antiterminator arrangement, with OFF genetic logic. Presumably, when SAM (and consequently methionine) pools are low, transcription of the full-length antisense RNA causes inhibition and degradation of the sense mRNA as is observed in some bacterial regulatory systems that employ small RNAs [53]. When SAM levels are high, the SAM-I riboswitch will prematurely terminate the antisense transcript, allowing expression of this operon to recycle excess methionine. In some instances, riboswitches or their components are found in tandem arrangements. Almost all glycine riboswitches consist of two aptamers that regulate a single downstream expression platform [6]. In the genomic sequences searched here, 88% of the mRNA leaders containing one glycine aptamer also carry a second aptamer. Cooperative binding of two ligand molecules by these glycine riboswitches yields a genetic switch that is more 'digital', that is, more responsive to smaller changes in ligand concentration, than a single aptamer. Far less common are tandem arrangements of other riboswitch classes such as TPP [7,54,55] or AdoCbl [55]. Fewer than 1% of the UTRs regulated by these riboswitch classes contain multiple aptamers. In these cases, each aptamer appears to function as an independent riboswitch that regulates its own expression platform to yield a more digital, compound genetic switch [7]. Also rare are tandem arrangements wherein representatives of two different riboswitches are in the same UTR. In the metE mRNA leader from Bacillus clausii, a SAM-I and an AdoCbl riboswitch independently control transcription termination to combinatorially regulate expression of this gene in response to two different metabolite inputs [55]. Riboswitch mechanisms A decision tree was established for computationally classifying the gene control mechanisms of microbial riboswitches (Figure 2). The five categories assigned are: transcription attenuation; dual transcription and translation attenuation; translation attenuation; direct translation attenuation; and antisense regulation. The same mechanisms have been predicted for TPP [48], AdoCbl [20], FMN [56], and lysine [21] riboswitches in previous comparative studies. The use of the term attenuation here does not imply that a switch operates with OFF genetic logic, that is, gene expression may be attenuated in the ligand-free state and relieved by metabolite binding. Overall, computational assignments by this procedure have an accuracy of 88% when compared to expert predictions of TPP riboswitch mechanisms [48]. Figure 2 Riboswitch mechanism prediction scheme. The decision tree used to classify riboswitch mechanisms into five categories is shown. Depicted are OFF switches in their ligand-bound state where a P1 switching helix has formed. See the main text and Materials and methods for additional details. It is important to note that the decision tree does not explicitly predict RBS-hiding structures in expression platforms. Rather, it assumes that control of translation initiation is the most likely mechanism for riboswitches not classified into the other categories. It is possible that these riboswitches could operate by mechanisms other than the five assigned by this procedure (as described above). Another caveat is that this prediction scheme considers only intrinsic terminator structures consisting of RNA stem-loops followed by polyuridine tails. These are currently the only structures that riboswitches with transcription attenuation mechanisms are known to regulate. However, some bacteria appear to be able to utilize other structures that may lack a canonical U-tail or consist of tandem hairpins to terminate transcription [57]. Mapping riboswitch mechanism predictions onto a phylogenetic tree (Figure 3) reveals that transcription attenuation dominates in Firmicutes and that translation attenuation is most common in other bacterial groups. The phylogenetic distribution of SAM-II riboswitch mechanisms is an exception. It is the only riboswitch aptamer that appears to be most often associated with regulatory transcription terminators in α- and β-Proteobacteria, although the mechanisms by which SAM-II aptamers control gene expression have not yet been experimentally established [18]. Transcription attenuation mechanisms may also be generally overrepresented in Fusobacteria, δ/ε-Proteobacteria, Thermatogae, and Chloroflexi species, although smaller sample sizes make these conclusions less certain. Figure 3 Riboswitch mechanisms. The mechanisms that riboswitches from different taxonomic groups use to regulate gene expression were classified on the basis of expression platform features (Figure 2). The fractions of riboswitch expression platforms in each category are displayed visually as shaded bars with the actual numbers observed written above in the order given in the legend. The phylogenetic tree on the left is described in the legend to Figure 1. Mechanisms that rely on sequestering the RBS within the conserved aptamer core are most common for the TPP, preQ1, and SAM-I riboswitches. In the first two cases, purine-rich conserved regions near the 3' ends of the riboswitch substitute for RBS sequences. In SAM-I riboswitches, the RBS is incorporated into the 3' side of the P1 stem. Other riboswitch classes also have purine-rich conserved regions near their 3' ends with consensus sequences close to ribosome binding sites. It is not clear why direct regulation of translation attenuation is not more common in these other classes. Perhaps access to the RBS-like sequences in these aptamers is not modulated by ligand binding. Riboswitch regulation by direct translation attenuation appears to be most frequent in Actinobacteria and Cyanobacteria, except for the preQ1 riboswitch where this mechanism is unusually prevalent, even in Firmicutes and Proteobacteria. There do not appear to be any additional examples of riboswitches positioned for antisense regulation in this data set. An antisense arrangement may be rare because it inverts the gene control logic of the riboswitch and requires the evolutionary maintenance of a second promoter. A handful of high-scoring hits were found that appear to be functional aptamers even though they are not located upstream of genes related to the cognate metabolite. It is possible that these riboswitches affect their target genes by regulating the production or function of trans-acting antisense RNAs or that they have been recently orphaned by genomic rearrangements and are now pseudo-regulatory sequences. Evaluating structure models Constructing an RNA secondary structure model using phylogenetic sequence data requires identifying possible base-paired stems and adjusting a sequence alignment to determine whether each proposed stem appears reasonable for all representatives. This recursive refinement process has been used to create detailed comparative models of many functional RNA structures that accurately reflect later genetic, biochemical and biophysical data. However, the presence of stretches of unvarying nucleotides within an RNA structure, the tolerance of stems to some non-canonical base pairs or mismatches, and the non-negligible frequency of sequencing errors in biological databases can introduce enough uncertainty that multiple structures may seem to agree with a sequence alignment and incorrect base-paired elements may be proposed. This problem is compounded if the multiple sequence alignment is incomplete and does not yet capture all of the variation that truly exists at each nucleotide position. Inconsistencies and ambiguities in some riboswitch aptamer models motivated us to evaluate the statistical support for base pairs in their proposed structures. We chose to use mutual information (MI) scores [58] to mathematically formalize the interdependence between sequence alignment columns that is indicative of base interactions. MI is a normalized version of covariance that represents the amount of information (in bits) gained about what base occurs at a given position from knowing the identity of a base at another position. The prediction of RNA secondary structures and tertiary interactions from covariation in sequence alignments has a long history, and the nuances of calculating and interpreting MI scores have been comprehensively covered elsewhere [59,60]. Fundamentally, columns of interacting bases must be correctly aligned and there must be variation within each column (that is, it cannot be completely conserved) in order to detect mutual information. Even when these preconditions are met, there are two difficulties with directly comparing MI scores to determine which columns in a sequence alignment truly covary. First, sequence conservation derived from the shared evolutionary histories of sequence subsets in an alignment may result in a high residual background MI score between many columns whether or not they are functionally linked. Second, alignments with fewer sequences will have more column pairs with elevated MI scores simply by chance. Simulations addressing the expected magnitudes of these two sources of error in different data sets have been explored recently in the context of protein sequence alignments [61]. In order to better gauge whether MI scores support proposed base interactions in an RNA alignment, we developed a procedure for empirically estimating their statistical significance (Figure 4). First, a phylogenetic tree is inferred from the observed RNA sequence alignment according to a model that assumes independent evolution at each position and allows for varying per-column mutation rates. Then, resampled alignments with the same topology, branch lengths, and evolutionary rates are generated. MI scores between columns in these test alignments reflect the null hypothesis that there is no covariation between positions. They implicitly correct for the evolutionary history and sample size of the real sequence alignment. Therefore, the p value significance for an observed MI score in the real alignment is the fraction of test alignments with higher MI scores between these two columns. Figure 4 Procedure for estimating MI significance between alignment columns. See the main text and Materials and methods for a complete description of the procedure used to estimate the statistical significance of MI scores between columns in a multiple sequence alignment in order to evaluate riboswitch secondary structures and predict new base-base interactions. Riboswitch structures The consensus secondary structure models of the ten riboswitch classes (Figure 5) have been updated to reflect information from newly identified aptamer variants. The purine, TPP, SAM-I, and GlcN6P riboswitch consensus structures have been drawn in accordance with their molecular structures (references in Table 1). Other riboswitch structures have been revised to be consistent with the new predictions of structure motifs and base-base interactions explained below. In all cases, previous numbering schemes for the paired helical elements (designated P1, P2, P3, and so on, beginning at the 5' end of each the aptamer) have been maintained, even when these stems do not occur in a majority of the sequences in the updated alignment. Newly discovered paired elements that do not appear in most examples of a riboswitch aptamer have not been assigned numbers. Figure 5 Riboswitch aptamer structures. The consensus secondary structure models based on expanded riboswitch sequence alignments are depicted according to the symbols defined in the inset. Each structure is further annotated with RNA structure motifs and the statistical significances (p values) of the mutual information scores between base-paired alignment columns. New predictions of interacting bases from the MI analysis are numbered and indicated by asterisks. More detailed descriptions of these predictions are provided in Figure 7. The results of the mutual information analysis are shown superimposed on the consensus riboswitch structures. Most base-paired helices are supported by at least one contiguous base pair with a highly significant MI (p 60 nucleotides (nt) of an open reading frame (ORF) on the same strand overlapped the aptamer or >700 nt separated the aptamer and the nearest downstream ORF were also screened out. Most of these cases appear to result from incorrect start codon choices, overpredictions of hypothetical ORFs, or missing annotation of real genes. The remaining sequences constituted the expression platform data set, and sequences beginning at the 5' end of each aptamer and continuing through the first 120 nt of the downstream ORF were extracted for further analysis. Riboswitches where the downstream gene was on the opposite strand were examined as candidates for antisense regulation. Other riboswitches were classified as directly regulating translation initiation when the downstream gene's start codon was within 15 nt of the end of the conserved aptamer core structure (usually the P1 paired element). The remaining expression platforms were scanned with the local RNA secondary structure prediction program Rnall (version 1.1) [89] for intrinsic transcription terminators with a scanning window of 50 nt, a U-tail weight threshold of 4.0, a U-tail pairing stability cutoff of -8.3 kcal/mol, and default settings for other parameters. Riboswitches with a terminator predicted in their expression platform sequence were assigned transcription attenuation mechanisms. These riboswitches were classified as also regulating translation if the distance between the terminator hairpin and the gene's start codon is no more than 10 nt. Expression platforms that did not match any of the above criteria are assumed to employ translation attenuation mechanisms. Rnall and distance parameters were calibrated by comparing expression platform predictions to expert predictions for a large and phylogenetically diverse collection of TPP riboswitches [48]. Rnall correctly predicts 46 out of 52 terminators in this data set with only 3 predictions of terminators in sequences not manually evaluated as containing a terminator (a sensitivity of 88% and an accuracy of 94%). The three false positives resemble terminators and may be functional, whereas the terminators that Rnall misses usually have large hairpins with poor thermodynamic stabilities. Overall, the decision tree classifies 159 out of 180 TPP riboswitch expression platforms (88%) correctly into the category assigned in the control set. Consensus secondary structures We manually adjusted the covariance model alignments of riboswitch aptamers while refining their consensus secondary structures. In particular, bases taking part in pseudoknotted pairings that cannot be represented by CMs were shifted to accurately represent these interactions. Bases flanking gapped consensus columns, which are sometimes ambiguously spread out across many possible positions by the alignment algorithm, were also systematically condensed into a minimum number of overall consensus columns. As new structure motifs and base-base interactions became evident, the alignments were adjusted to reflect these new constraints. Riboswitch sequences in the final alignments were weighted using Infernal's internal implementation of the GSC algorithm [90] to reduce biases from duplicate and similar sequences before calculating consensus structure statistics. Mutual information significance Duplicate sequences were purged and columns with >50% gaps were removed from riboswitch alignments prior to the MI analysis, and, if necessary, alignments were further pruned to the 300 most diverse sequences (as judged by pairwise base differences). A customized version of the program Rate4Site (version 2.01) [91] with modified output options was used to simultaneously estimate distances and per-column rates of evolution according to a gamma distributed model with at least 16 rate categories and a phylogenetic tree created with Jukes-Cantor distances that treated gaps as missing information. The resulting trees, rates, and distances were used to simulate 10,000 resampled alignments starting from an arbitrary ancestral sequence. Then, gaps and sequence weights were re-inserted into each of these derivative alignments at the same positions that they occupied in the original alignment. Mutual information was calculated between column pairs for all alignments according to standard formulas [60], taking into account sequence weights and treating gaps as a fifth character state. The resampled alignments were used to estimate what the MI score distribution would have been if the bases present in each column had evolved independently, without covariation constraints. The p value significance of the actual MI between two columns is the fraction of the resampled alignments that have a greater MI score than the value observed between those two columns in the real alignment. Abbreviations AdoCbl, adenosylcobalamin; CM, covariance model; FMN, flavin mononucleotide; GlcN6P, glucosamine-6-phosphate; H, Hoogsteen face; MI, mutual information; nt, nucleotides; ORF, open reading frame; preQ1, 7-aminoethyl 7-deazaguanine; RBS, ribosome binding site; SAM, S-adenosylmethionine; SE, sugar edge; TPP, thiamin pyrophosphate; UTR, untranslated region; WC, Watson-Crick face. Authors' contributions JEB designed the computational analyses, carried out the comparative studies, and created the figures. JEB and RRB interpreted the results and wrote the manuscript. Additional data files The following additional data files are available with the online version of this article. Additional data file 1 contains sequence alignments of the riboswitch aptamer data sets annotated with new base-base interactions in Stockholm format. Additional data file 2 contains sequence alignments of the riboswitch aptamer data sets annotated with new base-base interactions in HTML format. Supplementary Material Additional data file 1 Sequence alignments of the riboswitch aptamer data sets annotated with new base-base interactions in Stockholm format. Click here for file Additional data file 2 Sequence alignments of the riboswitch aptamer data sets annotated with new base-base interactions in HTML format. Click here for file
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                June 2012
                June 2012
                28 June 2012
                : 8
                : 6
                : e1002782
                Affiliations
                [1 ]Architecture et Réactivité de l′ARN, Université de Strasbourg, CNRS, IBMC, Strasbourg, France
                [2 ]Zentrum für Infektionsforschung (ZINF), Würzburg, Germany
                [3 ]Inserm U851, Centre National de Référence des Staphylocoques, Université de Lyon, Lyon, France
                [4 ]Institut für Molekulare Infektionsbiologie, Würzburg, Germany
                Uppsala University, Sweden
                Author notes

                Conceived and designed the experiments: PR EL JV. Performed the experiments: EL CMS IC A-CH PF. Analyzed the data: PR EL IC CMS FV JV. Contributed reagents/materials/analysis tools: PR FV JV. Wrote the paper: PR EL CMS JV.

                Article
                PGENETICS-D-11-02374
                10.1371/journal.pgen.1002782
                3386247
                22761586
                c8200b85-49d5-4709-b2db-dc80d2d1bdc1
                Lioliou et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 6 November 2011
                : 9 May 2012
                Page count
                Pages: 21
                Categories
                Research Article
                Biology
                Genetics
                Gene Expression
                RNA processing
                RNA stability
                Microbiology
                Microbial Growth and Development
                Model Organisms
                Prokaryotic Models
                Bacillus Subtilis

                Genetics
                Genetics

                Comments

                Comment on this article