137
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Post-transcriptional modification of RNA nucleosides occurs in all living organisms. Pseudouridine, the most abundant modified nucleoside in non-coding RNAs 1 , enhances the function of transfer RNA and ribosomal RNA by stabilizing RNA structure 28 . mRNAs were not known to contain pseudouridine, but artificial pseudouridylation dramatically affects mRNA function – it changes the genetic code by facilitating non-canonical base pairing in the ribosome decoding center 9, 10 . However, without evidence of naturally occurring mRNA pseudouridylation, its physiological was unclear. Here we present a comprehensive analysis of pseudouridylation in yeast and human RNAs using Pseudo-seq, a genome-wide, single-nucleotide-resolution method for pseudouridine identification. Pseudo-seq accurately identifies known modification sites as well as 100 novel sites in non-coding RNAs, and reveals hundreds of pseudouridylated sites in mRNAs. Genetic analysis allowed us to assign most of the new modification sites to one of seven conserved pseudouridine synthases, Pus1–4, 6, 7 and 9. Notably, the majority of pseudouridines in mRNA are regulated in response to environmental signals, such as nutrient deprivation in yeast and serum starvation in human cells. These results suggest a mechanism for the rapid and regulated rewiring of the genetic code through inducible mRNA modifications. Our findings reveal unanticipated roles for pseudouridylation and provide a resource for identifying the targets of pseudouridine synthases implicated in human disease 1113 .

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          Bidirectional promoters generate pervasive transcription in yeast.

          Genome-wide pervasive transcription has been reported in many eukaryotic organisms, revealing a highly interleaved transcriptome organization that involves hundreds of previously unknown non-coding RNAs. These recently identified transcripts either exist stably in cells (stable unannotated transcripts, SUTs) or are rapidly degraded by the RNA surveillance pathway (cryptic unstable transcripts, CUTs). One characteristic of pervasive transcription is the extensive overlap of SUTs and CUTs with previously annotated features, which prompts questions regarding how these transcripts are generated, and whether they exert function. Single-gene studies have shown that transcription of SUTs and CUTs can be functional, through mechanisms involving the generated RNAs or their generation itself. So far, a complete transcriptome architecture including SUTs and CUTs has not been described in any organism. Knowledge about the position and genome-wide arrangement of these transcripts will be instrumental in understanding their function. Here we provide a comprehensive analysis of these transcripts in the context of multiple conditions, a mutant of the exosome machinery and different strain backgrounds of Saccharomyces cerevisiae. We show that both SUTs and CUTs display distinct patterns of distribution at specific locations. Most of the newly identified transcripts initiate from nucleosome-free regions (NFRs) associated with the promoters of other transcripts (mostly protein-coding genes), or from NFRs at the 3' ends of protein-coding genes. Likewise, about half of all coding transcripts initiate from NFRs associated with promoters of other transcripts. These data change our view of how a genome is transcribed, indicating that bidirectionality is an inherent feature of promoters. Such an arrangement of divergent and overlapping transcripts may provide a mechanism for local spreading of regulatory signals-that is, coupling the transcriptional regulation of neighbouring genes by means of transcriptional interference or histone modification.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            tRNAdb 2009: compilation of tRNA sequences and tRNA genes

            One of the first specialized collections of nucleic acid sequences in life sciences was the ‘compilation of tRNA sequences and sequences of tRNA genes’ (http://www.trna.uni-bayreuth.de). Here, an updated and completely restructured version of this compilation is presented (http://trnadb.bioinf.uni-leipzig.de). The new database, tRNAdb, is hosted and maintained in cooperation between the universities of Leipzig, Marburg, and Strasbourg. Reimplemented as a relational database, tRNAdb will be updated periodically and is searchable in a highly flexible and user-friendly way. Currently, it contains more than 12 000 tRNA genes, classified into families according to amino acid specificity. Furthermore, the implementation of the NCBI taxonomy tree facilitates phylogeny-related queries. The database provides various services including graphical representations of tRNA secondary structures, a customizable output of aligned or un-aligned sequences with a variety of individual and combinable search criteria, as well as the construction of consensus sequences for any selected set of tRNAs.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Rate-Limiting Steps in Yeast Protein Translation

              Introduction Protein translation is central to cellular life. Although individual steps in translation such as the formation of the 43S preinitiation complex are known in intricate molecular detail, a global understanding of how these steps combine to set the pace of protein production for individual genes remains elusive (Jackson et al., 2010; Plotkin and Kudla, 2011). Factors such as biased codon usage, gene length, transcript abundance, and initiation rate are all known to modulate protein synthesis (Bulmer, 1991; Chamary et al., 2006; Cannarozzi et al., 2010; Tuller et al., 2010a; Shah and Gilchrist, 2011; Plotkin and Kudla, 2011; Gingold and Pilpel, 2011; Chu et al., 2011; Chu and von der Haar, 2012), but how they interact with one another to collectively determine translation rates of all transcripts in a cell is poorly understood. Systematic measurements for some of the most critical rates—such as the gene-specific rates of 5′ UTR scanning and start codon recognition—are extremely difficult to perform. As a result, questions as fundamental as the relative role of initiation versus elongation in setting the pace of protein production are still actively debated (Kudla et al., 2009; Tuller et al., 2010a; Plotkin and Kudla, 2011; Gingold and Pilpel, 2011; Chu et al., 2011; Chu and von der Haar, 2012; Ding et al., 2012). Biotechnical applications that exploit these processes stand to gain from a quantitative understanding of the global principles governing protein production (Gustafsson et al., 2004; Salis et al., 2009; Welch et al., 2009). Recent advances in synthetic biology allow high-throughput studies on the determinants of protein production (Kudla et al., 2009; Welch et al., 2009; Salis et al., 2009). Sequencing techniques such as ribosomal profiling provide snapshots of the translational machinery in a cell (Ingolia et al., 2009; Reid and Nicchitta, 2012). One way to leverage this new information is to develop a computationally tractable model of translation in a cell, to parameterize it from known measurements, and to use it to infer any unknown parameters of global translation dynamics. Here, we develop a whole-cell model of protein translation, and we apply it to study translation dynamics in yeast. Our model describes translation dynamics to the single-nucleotide resolution for the entire transcriptome. In combination with ribosomal profiling data, we use our model to infer the initiation rates of all abundant yeast transcripts. We systematically explore how the codon usage, transcript abundance, and initiation rate of a transgene jointly determine protein yield and cellular growth rate. Applied to the endogenous genome, our model reproduces one of the defining features of ribosomal profiling measurements: a decrease in ribosome density with codon position. We evaluate both elongation- and initiation-driven hypotheses for the ramp of 5′ ribosome densities. We also describe the factors that influence ribosomal pausing along mRNA molecules, as well as the effects of stress on translation. Results Model We developed a continuous-time, discrete-state Markov model of translation. The model tracks all ribosomes and transfer RNA (tRNA) molecules in a cell—each of which is either freely diffusing or bound to a specific messenger RNA (mRNA) molecule at a specific codon position at any time point (Extended Experimental Procedures). Rates of initiation and elongation are based on physical parameters that have been experimentally determined in yeast, including the cell volume, the abundances of ribosomes and tRNAs, and their diffusion constants (Tables 1 and S1 available online). Transition rates among states are parameterized in seconds so that the model describes the dynamics of translation in real time, as opposed to using arbitrary discrete time steps. We provide a precise definition of the Markov state space, as well as pseudocode and complete source code in Data S1 and S2 and also Table S2. Unlike many other models of translation (Gilchrist and Wagner, 2006; Mitarai et al., 2008; Reuveni et al., 2011), which treat each mRNA molecule in isolation and assume an inexhaustible supply of free ribosomes that initiate the message at a constant rate, our model keeps track of every tRNA, mRNA, and ribosome molecule in the cell simultaneously, and so it captures the indirect effects of one gene’s translation on another’s (Figure 1). In particular, if many ribosomes are engaged in translating the mRNAs of one gene, this reduces the pool of free ribosomes and tRNAs available to translate other genes. Our model makes a number of simplifying assumptions. Most importantly, our model treats the total number of ribosomes, tRNA molecules, and mRNA molecules in the cell as fixed quantities because the dynamics of their production and decay are typically slower than those of protein translation (García-Martínez et al., 2004; Larson et al., 2011). We specify the total number of ribosomes and tRNA molecules to agree with their experimentally determined values in an exponential-phase yeast cell: 2 × 105 and 3.3 × 106, respectively (Waldron and Lacroute, 1975; Warner, 1999; von der Haar, 2008; Siwiak and Zielenkiewicz, 2010; Chu and von der Haar, 2012). We infer gene-specific initiation probabilities (Extended Experimental Procedures) so that 85% of ribosomes are bound to mRNAs in equilibrium in agreement with measurements in yeast (Arava et al., 2003; Zenklusen et al., 2008). We further assume that tRNA charging is fast, which is reasonable because 80% of all tRNAs are charged at any given time in exponential-phase cells (Varshney et al., 1991; Jakubowski and Goldman, 1992; Chu et al., 2011). As a result of these parameters, the equilibrium number of free ribosomes available in the cell is typically smaller than the number of available charged tRNAs of each species. In this regime, we will show that protein production is generally limited by the rate of translation initiation in the sense that increasing the initiation probability of an mRNA molecule will typically increase the rate at which protein is produced, but increasing its codon elongation rates generally will not increase production. The initiation-limited regime agrees with the long-standing view of endogenous protein synthesis (Andersson and Kurland, 1990; Bulmer, 1991; Eyre-Walker and Bulmer, 1993; Lackner et al., 2007; Plotkin and Kudla, 2011), but it contrasts with other models of translation that assume an inexhaustible supply of ribosomes, which are always available for initiation of an mRNA regardless of how many ribosomes are bound to other mRNAs (Mitarai et al., 2008; Reuveni et al., 2011; Tuller et al., 2011). We implemented our Markov model of translation using the Gillespie algorithm. We simulated 1,500 s of translation and extracted the final 500 s to collect data on translation dynamics in equilibrium (Experimental Procedures). Our implementation requires about 1,300 s of computation time to simulate all initiation and elongation events in a wild-type cell for 1,500 s. In these simulations, at equilibrium, the mean elongation rate is 9.3 aa/s (median = 9.5 aa/s), and the mean distance between consecutive bound ribosomes is 60 codons (median = 34). Both of these quantities agree with empirical measurements in yeast (Arava et al., 2003). Codon Bias and Transgene Expression Optimizing a transgene’s codon usage to the tRNA content of a cell often improves protein yield (Gustafsson et al., 2004; Welch et al., 2009), but the underlying mechanisms have not been systematically explored. To study this in a quantitative model, we simulated translation of a transgene within the context of a Saccharomyces cerevisiae cell containing 3,795 endogenous genes whose transcript levels and gene-specific initiation probabilities were estimated from ribosomal profiling data (Ingolia et al., 2009) (Experimental Procedures). By varying the codon adaptation index (CAI) (Sharp and Li, 1987) and transcript level of the transgene across many simulations, we delineated the regimes for which increasing codon bias is expected to increase protein yield and by what mechanisms. Using the green fluorescent protein (GFP) as an example transgene, we found that increasing the CAI of a transgene significantly improves the rate of proteins produced per mRNA molecule only when the transgene mRNA accounts for a substantial proportion of all the mRNA in the transcriptome (Figure 2 and Table S4). For a transgene whose messages account for 50% of the cell’s mRNA content, for example, increasing CAI from almost zero to one results in nearly 3.6-fold more proteins produced per transcript per second (Figure 2B, triangles), whereas optimizing CAI in a transgene expressed at only 1% of the transcriptome results in a more modest increase (∼50%) in its rate of protein production (Figure 2B, squares). These results help explain the divergent views of biotechnological studies, which often report large gains in protein production upon optimizing transgene CAI (Gustafsson et al., 2004), and evolutionary studies of endogenous translation, which typically report very small effects of CAI on protein production per message (Bulmer, 1991; Tuller et al., 2010b; Gingold and Pilpel, 2011; Plotkin and Kudla, 2011). The discrepancy arises because transgenes are usually overexpressed and comprise a substantial fraction of all cellular mRNA, whereas endogenous genes are expressed at 1% of the transcriptome or less. Why does codon bias strongly influence protein yield only when a gene has high mRNA abundance? The reason has to do with the effects of codon bias on the pool of free ribosomes, as seen in Figure 3. At equilibrium, neglecting rare abortion events, the rate of protein production from any given mRNA (i.e., the rate of polypeptide termination) must equal the rate of initiation on that mRNA, which, in turn, depends primarily on the abundance of free ribosomes in the cell. Increasing the CAI of a gene will increase its codon elongation rates and thus decrease the density of ribosomes on each of its mRNAs, but the overall effect on the pool of free ribosomes is small when the gene accounts for a small proportion ( 0.9; Figures S1A and S1B). We then inverted our equations to infer gene-specific initiation probabilities from observed densities of ribosomes on transcripts. An alternative method of estimating initiation probabilities from profiling data was independently developed by Siwiak and Zielenkiewicz (2010). We validated that our analytical method can indeed reliably infer initiation probabilities when we simulate ribosome profiling data for S. cerevisiae genes with known initiation probabilities (Figure S1B). Using this method, we inferred the initiation probabilities for the 3,795 S. cerevisiae genes whose ribosomal densities have been reliably measured (Ingolia et al., 2009). The initiation probabilities we inferred for yeast genes vary by many orders of magnitude. According to these estimates, the average time between initiation events on a given mRNA molecule ranges from 4 s (fifth percentile) to 233 s (95th percentile), with a median value of 40 s. This variation provides the cell considerable range for tuning protein levels by modulating initiation probabilities of genes. Experiments with individual genes (Hall et al., 1982; Duan et al., 2003) and with large sets of coding sequences (Kudla et al., 2009) suggest that strong 5′ mRNA structure reduces the rate of initiation, presumably by obstructing ribosomal-mRNA binding. Using a large set of synthetic GFP genes that vary synonymously, we confirmed experimentally that 5′ mRNA folding plays a predominant role in determining protein levels in S. cerevisiae (Figure S2), which is similar to the role it plays in Escherichia coli (Kudla et al., 2009). In light of these experiments, we compared the initiation probabilities we estimated for 3,795 endogenous yeast genes with their predicted 5′ mRNA folding energies (nucleotides −4 to +37, Experimental Procedures) and found a strong positive correlation (Pearson correlation R = 0.125 and p  0.9). Moreover, we validated that we can reliably infer initiation probabilities from simulated ribosomal profiling data even when gene length and initiation probabilities are positively correlated (Figures S1C and S1D and Extended Experimental Procedures), indicating that the negative correlation observed in the real yeast data is not an artifact of our inference procedure. Why should short genes experience selection for fast initiation? Short genes are enriched for constitutively expressed housekeeping and ribosomal genes (Hurowitz and Brown, 2003), which must produce protein as rapidly as possible. In addition, housekeeping genes tend to have shorter 5′ UTRs and are under weaker posttranscriptional regulation (Hurowitz and Brown, 2003; Lin and Li, 2012). The probability of successful ribosomal binding and scanning on an mRNA may depend on the length of its 5′ UTRs; indeed, we find that genes with shorter 5′ UTRs exhibit higher inferred initiation probabilities (p  5 × 105, Extended Experimental Procedures) exceeds the empirical measurement of the total number of ribosomes in a yeast cell (1.87 × 105 ± 5.6 × 104; von der Haar, 2008) by a factor of 2.5. When we artificially increase the number of ribosomes and tRNAs in our simulations beyond their empirically measured abundances, we can recapitulate the patterns produced by TASEP models of translation (Figure S5A). In this regime, which we argue is unrealistic, we still observe a decrease in the average ribosome density with codon position, but this ramp is caused by collisions along each mRNA, and it persists regardless of gene-specific initiation probabilities or codon ordering within genes (Figure S5B). Thus, models of translation in both initiation- and elongation-limited regimes produce similar global patterns of ribosomal densities with codon position but for entirely different and contradictory mechanisms. Only the initiation-limited regime is consistent with empirical measurements of ribosome abundances in the yeast cell. Ribosomal Interference and Codon Usage Our simulations allow us to estimate the amount of time a ribosome spends waiting for a tRNA at each codon position, called ribosomal pausing, and also the amount of time a ribosome wastes at any position due to interference by an adjacent downstream ribosome that prevents further elongation, called ribosomal stalling. We identified the sequence features of a gene that predispose it to ribosomal pausing or stalling (Experimental Procedures). Using GFP as an example transgene simulated at 50% mRNA transcriptome abundance, we found that increasing the transgene’s codon bias tends to decrease the overall density of ribosomes on its mRNAs, as well as the frequency of ribosomal stalling (Figure 6). For a transgene with high CAI, the probability of finding a ribosome bound at a given codon is negatively correlated with the abundance of corresponding iso-accepting tRNAs (Pearson correlation, R = −0.802), but this correlation is much weaker for a transgene with low CAI (R = 0.042 and p > 0.05). In other words, the waiting time per codon is largely determined by the abundance of corresponding tRNAs for a gene with high CAI. But for a gene with low CAI, ribosomes densities are higher overall and so the waiting time at each codon is also influenced by interference with downstream ribosomes and, therefore, is not easily predicted from tRNA abundances. In fact, regardless of CAI, there is a strong correlation between ribosomal stalling at a position and the probability of ribosomal pausing 10 codons downstream (R = 0.958 for high CAI and R = 0.644 for low CAI). Because the probability of pausing in a high-CAI transgene sequence is correlated with tRNA abundances, it is possible to predict the positions of ribosomal stalling from the transgene sequence alone. Understanding the effects of amino acid and codon usage on pausing and stalling may prove useful in designing transgene sequences to minimize ribosomal interference on its mRNAs. Protein Translation under Stress The simulations of translation described above were performed under parameters of optimal cell growth. Translation dynamics likely differ when a cell experiences stress. To investigate how protein production is affected by stress and how a cell might adapt in response, we simulated translation under conditions of amino acid starvation. We modeled starvation of a particular amino acid by reducing the abundance of its (charged) cognate tRNAs by either 2-, 5-, or 10-fold. As expected, we found that the rate of total protein production decreases under stress (Figures 7A and S6A). Furthermore, starvation of different amino acids can have radically different effects on protein production. For example, 10-fold starvation of amino acids Ala, Leu, Glu, Gln, or Ser decreases total protein production by at least 10-fold, whereas an equivalent starvation of Met, Trp, or His reduces protein production by less than 25% (Figure 7A). As expected, the effect of starvation of a particular amino acid is significantly correlated with its abundance encoded in the transcriptome (p  3 ambiguous N symbols. We used RNAfold (Hofacker et al., 1994) to estimate the mRNA folding energy from base −4 to 37 for each gene, using default parameters. Estimating Ribosomal Interference To identify regions of ribosomal pausing and interference on a transgene sequence, we simulated translation in the cell with a transgene accounting for 50% of the (mRNA) transcriptome. We ran the simulation for 500 s in equilibrium and sampled the state of the system every second. We used the average number of ribosomes bound at each position to quantify the frequency of ribosomal pausing. To quantify the frequency of ribosomal stalling, we calculated the fraction of bound ribosomes at a position that also have another bound ribosome ten codons (positions) ahead on that mRNA in the same time sample. Extended Experimental Procedures Simulation Model We describe protein translation using a discrete-state continuous-time Markov model of initiation, elongation, and termination events in a cell. The model assumes a fixed total number of ribosomes and tRNAs, and it describes how these entities initiate and elongate a fixed supply of mRNAs. Our model neglects the dynamics of transcription, mRNA decay, and co-transcriptional translation; it also neglects the production and decay of ribosomes and tRNAs themselves. These processes are typically slower than the dynamics of translation, and so our model nonetheless provides an accurate description of translation in a cell in most conditions. We assume a genome comprised of n genes, each with a prescribed coding sequence, and each with a fixed abundance A i , of mRNA copies in the cell. Gene i encodes an mRNA of length L i codons; each such codon is assigned one of k possible values (k = 61 in the standard genetic code). Each gene i also has a corresponding probability of translation initiation, denoted p i , which is described below. Corresponding to each type of codon j is one of 41 iso-accepting tRNA species, denoted ϕ ( j ) , which has a fixed total abundance T ϕ ( j ) t in the cell. At any time in our Markov model, each molecule of tRNA species ϕ ( j ) is either free in the cell, or bound, along with a ribosome, to some codon of type j in some mRNA in the cell. Thus, at each time, the total number of tRNAs of type ϕ ( j ) can be decomposed into those that are currently bound and those that are currently free: T ϕ ( j ) t = T ϕ ( j ) b + T ϕ ( j ) f . Likewise, the total number of ribosomes, R t , can be decomposed into bound and free: R t = R b + R f . Moreover, the number of bound ribosomes always equals the total number of bound tRNAs of all species: R b = ∑ k = 1 41 T k b . In our continuous-time Markov model, initiation and elongation events occur at rates that are determined by the current state of system (the number of free ribosomes, and the locations of all bound ribosomes) and by the underlying physical parameters of the cell. The underlying physical parameters are simply the volume of the cell, and the characteristic lengths and diffusion constants of ribosomes and tRNA molecules. The time between subsequent events are exponentially distributed, and Monte Carlo simulations proceed simply by incrementing time according to exponential deviates and re-computing rates of subsequent events (Gillespie, 1977). We provide the model source code, and associated datasets used in the current simulations as a supplement (Data S1). Additionally, the latest version of the code is also made freely available at http://mathbio.sas.upenn.edu/shah-cell-2013-code.tar.gz. Diffusion of Ribosomes and tRNAs We compute initiation and elongation rates by considering the diffusion of ribosome and tRNA molecules in the cell. Assuming a spherical cell of volume V = 4.2 × 10 − 17 m3 (Jorgensen et al., 2002), the number of different discrete positions that can be occupied by any molecule is N = V / λ 3 , where λ is the characteristic length of the molecule. The characteristic lengths of tRNA and ribosomes have been measured as λ t = 1.5 × 10 − 8 m and λ r = 3 × 10 − 8 m, respectively (Nissen et al., 1999; Politz et al., 2003). Thus, the number of available discrete positions for tRNA and ribosome molecules are N t = 1.24 × 10 7 and N r = 1.56 × 10 6 , respectively. The average time required for any given molecule to move from one position in the cell to another, known as the characteristic time τ, is given by (1) τ = λ 2 6 D where D is the diffusion coefficient of the molecule. The diffusion coefficients of tRNAs and ribosomes are known, D t = 8.42 × 10 − 11  m2/s and D r = 3 × 10 − 13 m2/s (Politz et al., 2003; Werner, 2011), and hence their characteristic times are τ t = 4.45 × 10 − 7 s and τ r = 5 × 10 − 4 s, respectively. The characteristic times allow us to compute the rate at which a free ribosome or mRNA molecule reaches any particular position in the cell. In particular, if there are N positions that can be occupied by a molecule, then a given molecule with characteristic time τ will reach a particular position in the cell at rate 1 / τ N . For example, if there are R f free ribosomes, then the rate at which any free ribosome reaches a given mRNA molecule is simply R f / τ N r . Translation Initiation Rates Given the current state of the system (the number of free ribosomes, and the locations of all bound ribosomes), each mRNA of type i will be initiated at rate ρi . The rate ρi is set to zero if any of the first 10 codons of the mRNA is currently bound by a ribosome. Otherwise, the rate is ρ i = p i R f τ N r . The term R f / τ N r in this equation denotes the rate at which any free ribosome diffuses to a given mRNA molecule. And the term p i  denotes the initial probability of an mRNA of type i: the chance that a ribosome will actually initiate translation of such an mRNA molecule, once it has diffused to its 5′ end. The parameters p i allow us to account for sequence-specific variation in initiation probabilities among genes (Kudla et al., 2009). Translation Elongation Rates Any given ribosome currently bound to some mRNA will elongate at some rate. Consider a ribosome bound at codon position k on an mRNA. Its rate of elongation is set to zero if any of the following k + 10 codons of the mRNA are currently occupied by another ribosome, because of interference. Otherwise, the rate at which the ribosome elongates the subsequent codon, of type j, depends on the number of free cognate tRNAs for that codon T ϕ ( j ) f and the wobble parameter associated with the tRNA-codon pair w j . If there is a perfect match between the tRNA and the codon, then w j  = 1. Else w r y / y r = 0.64 if the mismatch is due to a purine-pyrimidine wobble or w r r / y y = 0.61 if the mismatch is due to purine-purine or pyrimidine-pyrimidine wobble (Curran and Yarus, 1989; Lim and Curran, 2001). The rate at which a cognate tRNA elongates to the codon at position k + 1 is thus given by T ϕ ( j ) f w j τ t N t In addition, during elongation various tRNAs compete for the focal ribosome. The ribosome thus spends a considerable amount of time checking whether a given tRNA in its A-site is in fact a cognate tRNA for the codon it is about to elongate. The time spent by the ribosome in selecting the cognate tRNA depends on the relative abundances of various tRNAs as well as organism specific kinetic rates associated with ribosomal proofreading. Because these kinetic rates are not available for yeast, we use the values obtained in Escherichia coli (Fluitt et al., 2007; Gromadski and Rodnina, 2004). Using these parameters and tRNA abundances in yeast, we used numerical simulations described in Fluitt et al. (2007) to estimate the average time spent by the ribosome in kinetic proofreading to select the correct tRNA. As a result, accounting for tRNA competition coefficient s, the actual elongation rate of a codon is T ϕ ( j ) f w j s τ t N t Translation Termination We assume that translation termination is an instantaneous event that occurs immediately after elongation of the last codon at position L. Upon termination the pool of free ribosomes and free tRNAs corresponding to the codon j′ at position L − 1 each increases by 1 ( R f → R f + 1 ; T ϕ ( j ) ′ f → T ϕ ( j ) ′ f + 1 ) . Analytic Approximation for Steady-State Behavior Whereas we have used the complete stochastic model described above to produce all the simulation figures in the main text, it is convenient to approximate its steady-state behavior by analytical equations, especially for the purpose of inferring gene-specific initiation probabilities from ribosomal profiling data. To do so we derive here an analytic steady-state approximation, based on ordinary differential equations that treat all quantities as continuous variables and are therefore accurate when the molecular quantities are large. This approximation neglects the possibility of ribosomal interference during elongation, and so it is not expected to hold in regimes for which mRNAs are densely packed with ribosomes. We will derive analytic approximations for the steady state elongation times of codons, the amount of free tRNAs, the initiation and total elongation times of all mRNAs, and the steady-state number of free ribosomes in the cell. Consider a cell with a total number of ribosomes R t and n genes each with A i mRNA copies. Assuming no ribosomal interference during translation, the expected number of ribosomes bound to each mRNA can be approximated by solving the differential equation (2) d R i b d t = ρ i − R i b ϵ i where ρi and ϵi are the rates of initiation and total elongation of the i th mRNA, respectively. At steady-state the total number of bound ribosomes is then given by (3) R b = ∑ i = 1 n A i ρ i ϵ i The rates of translation initiation and total elongation in turn depend on the amounts of free ribosomes R f and free tRNAs T f , in addition to the characteristic times of these molecules. We assume that translation termination is instantaneous and does not contribute to the overall rate of translation. Thus the initiation rate on an mRNA can be given as (4) ρ i = R f p i τ r N r where p i is the probability of initiation given that the ribosome has reached the mRNA. p i is sequence-specific and accounts for the variation in initiation rates of various mRNAs. Similarly, when a ribosome is bound to the mRNA, the time taken to elongate codon j depends on the number of the free cognate tRNAs T ϕ ( j ) f , the wobble parameter w j , and the tRNA competition coefficient s: (5) c j = τ t N t T ϕ ( j ) f w j s . Thus at equilibrium, assuming no ribosomal collisions/interference, the expected total elongation rate of a ribosome on an mRNA is (6) ϵ i = 1 ∑ j = 1 k x j c j where x j is the number of codons of type j, and k denotes the total types of codons (typically k = 61). Case 1: One Gene and One Amino Acid with Two Codons Consider a simple case of one gene of length L codons composed of a single amino acid with two types of codons, each translated by a single tRNA type (T 1 or T 2). Let the expression level of the gene be A, relative frequency of codon 1 be u, and the total number of ribosomes in the cell R t . Based on Equation (6), the total elongation rate of that gene is given by (7) ϵ = 1 L ( u c 1 + ( 1 − u ) c 2 ) where c 1 and c 2 are given by Equation (5) (8) c 1 = τ t N t T 1 f w 1 s (9) c 2 = τ t N t T 2 f w 2 s Note that whenever a ribosome is bound to an mRNA waiting for a tRNA corresponding to the codon at its A-site, a tRNA is bound at its P-site attached to the growing polypeptide chain. Assuming that the codons in the gene are randomly distributed, the frequency of tRNA types at ribosomal P-sites are independent of the waiting time for codons in the A-sites. In addition, the total number of bound ribosomes should equal the number of bound tRNAs of all types R b = T 1 b + T 2 b . As a result, the number of bound tRNAs of each type is simply proportional to its codon usage. (10) T 1 b = R b u (11) T 2 b = R b ( 1 − u ) Note that the above relationship works if the number of bound ribosomes R b is less than the ratio of total tRNAs of either type to their codon usage: ( R b 0.9, Spearman correlation, R = 0.997). In addition, we validated that we can reliably infer initiation probabilities from simulated ribosomal profiling data even when gene length and initiation probabilities are positively correlated (Figures S1C and S1D). This result indicates that the negative correlation between gene length and inferred initiation probability observed in the real yeast data is not an artifact of our inference procedure. Correlation between Gene Length and Ribosome Density in Ribosome Profiling Data One of the hallmark features of ribosomal profiling data (Ingolia et al., 2009) is the decrease in ribosome density with increasing codon position. This has been argued to be driven by heterogeneity in ribosome density along each mRNA molecule, with higher densities in the 5′ region of genes due to less optimal codons (Tuller et al., 2010). In order to show that position-specific heterogeneity in ribosome density is not in fact the primary cause of these patterns we used the average ribosomal density of each gene and assumed that this density is spread uniformly across the entire length of the sequence. We then recomputed the transcriptome-wide average ribosome density, by codon position, assuming a uniform density along each mRNA. We found that in the resulting profile, even upon removing position specific heterogeneity for each individual mRNA, we still observed a sharp decrease in average ribosome density with codon position (Figure S3A). In addition, when inspecting the profiling data on a gene-by-gene basis we find that just as many genes exhibit a trend of increasing ribosome density, from 5′ to 3′, as show evidence of decreasing ribosome density (Figure S3B). These analyses of the primary profiling data confirm the conclusions drawn from our simulations of translation: the apparent 5′ ribosome ramp does not actually require a higher density of ribosomes near the 5′ end of each message, but rather it can be explained simply by a greater density of ribosomes on shorter mRNA molecules. Mapping Ribosome Profile Reads to Genes The ribosome profiling reads of Ingolia et al. (2009) and their alignment files were downloaded from GEO under the accession number GSE13750. We compared the mapped positions of the sequencing reads to the S. cerevisiae genome annotation file downloaded from UCSC genome browser ([Karolchik et al., 2003; Dreszer et al., 2012], genome version June 2008 [SGD/sacCer2]). For each coding sequence, we counted the number of reads that were mapped to each codon (we assigned the read to the codon that mapped to its 17th base), as well as the total number of reads mapped to the sequence. To avoid ambiguity we excluded the reads that were mapped to multiple positions across the genome. Comparison with Ribosome Flow Model of Translation The ribosome flow model (Reuveni et al., 2011) describes the translation of an individual mRNA molecule with a fixed rate of initiation and a fixed rate of elongation per codon. By assuming fixed rates of initiation and elongation, the model implicitly assumes a constant, inexhaustible supply of free ribosomes and free tRNAs in the cell. The ribosome flow model describes the translation of each mRNA molecule independently of all other mRNAs, and so the model does not account for competition among mRNAs for free ribosomes or free tRNAs in the cell. In other words, if one mRNA species is highly abundant and densely packed with ribosomes, then this does not limit the pool of available ribosomes to initiate other mRNAs, according to the assumptions of the ribosome flow model. Furthermore, the model predicts that each mRNA is translated close to (93% of) its maximum translation rate (Reuveni et al., 2011; Tuller et al., 2011). As a result, protein translation is generally elongation-limited in the ribosome flow model. This model, which rests on the implicit assumption that free ribosomes are always available, is expected to provide an accurate description of translation in a cell only under conditions in which a very large number of ribosomes are, indeed, free. To make this point explicit, we have calculated the predicted number of ribosomes bound to mRNAs in a yeast cell based on the estimates of average ribosome density obtained under the ribosome flow model (Reuveni et al., 2011). According to the ribosome flow model, the average ribosome density, per 15 codons, ranges from 0.36 to 0.42 for low-expression and high expression genes, respectively (Reuveni et al., 2011). Therefore, assuming an average ribosome density of 0.4 and a total transcriptome size of 2 × 10 7 codons (Zenklusen et al., 2008; Ingolia et al., 2009), the number of bound ribosomes predicted by the ribosome flow model is 2 × 10 7 × 0.4 / 15 = 5.33 × 10 5 . This number greatly exceeds the total number of ribosomes (free or bound) that have been measured in a real yeast cell ( 2 × 10 5 [Warner, 1999; von der Haar, 2008]). As this calculation suggests, the assumptions of the ribosome flow model imply that an unrealistically large number of ribosomes are required to translate all the mRNAs in a yeast cell. In order to compare the ribosome flow model with our whole-cell simulation of translation, we artificially increased the number of ribosomes and tRNAs in our simulations beyond their empirically measured abundances, so that a large supply of them would be free in equilibrium – in accordance with the assumptions of the ribosome flow model. To do so, we increased the number of tRNA molecules 10-fold (chosen so that there would be a large supply of free tRNAs of all species, even in the extreme case of every ribosome bound to a codon in the transcriptome). We also increased the numbers of ribosomes in the cell, ranging from a 2-fold to a 35-fold increase. To find the regime that corresponds to the ribosome flow model we identified the number of ribosomes required so that protein production in the cell is 93% of its maximal capacity (Figure S5A). To achieve this regime requires a 5-fold increase in the number of simulated ribosomes compared to the true, measured number of ribosomes in a yeast cell. In this regime our simulations recover the elongation-limited behavior of the ribosome flow model – but the total number of ribosomes bound to all mRNAs in the resulting simulated cell is about 8 × 10 5 (Figure S5A) in this regime, which again exceeds the measured number of ribosomes in a real yeast cell ( 2 × 10 5 [Warner, 1999; von der Haar, 2008]), by four-fold. Thus, it is possible for our model to recapitulate the behavior of the ribosome flow model – in which ribosomes are inexhaustibly abundant and the translation dynamics of each mRNA can be treated independently – but to do so requires assuming an unrealistic number of cellular ribosomes. In summary, the number of ribosomes required to reconcile our cellular model of translation with the ribosome flow model vastly exceeds the number of ribosomes in a normal yeast cell. Likewise, the number of bound ribosomes in the cell, according to direct estimates of ribosome densities inferred by the ribosome flow model (Reuveni et al., 2011), also exceeds the total number of ribosomes measured in a yeast cell. These calculations suggest that the elongation-limited regime described by the ribosome flow model is not realistic for most endogenous genes in a healthy yeast cell.
                Bookmark

                Author and article information

                Journal
                0410462
                6011
                Nature
                Nature
                Nature
                0028-0836
                1476-4687
                13 September 2014
                05 September 2014
                6 November 2014
                06 May 2015
                : 515
                : 7525
                : 143-146
                Affiliations
                Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, U.S.A
                Author notes
                Correspondence and requests for materials should be addressed to W.V.G. ( wgilbert@ 123456mit.edu )
                Article
                NIHMS624208
                10.1038/nature13802
                4224642
                25192136
                5147aec0-c3f1-433a-a418-be477f7549f7
                History
                Categories
                Article

                Uncategorized
                Uncategorized

                Comments

                Comment on this article