57
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Are Current Atomistic Force Fields Accurate Enough to Study Proteins in Crowded Environments?

      , *

      PLoS Computational Biology

      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The high concentration of macromolecules in the crowded cellular interior influences different thermodynamic and kinetic properties of proteins, including their structural stabilities, intermolecular binding affinities and enzymatic rates. Moreover, various structural biology methods, such as NMR or different spectroscopies, typically involve samples with relatively high protein concentration. Due to large sampling requirements, however, the accuracy of classical molecular dynamics (MD) simulations in capturing protein behavior at high concentration still remains largely untested. Here, we use explicit-solvent MD simulations and a total of 6.4 µs of simulated time to study wild-type (folded) and oxidatively damaged (unfolded) forms of villin headpiece at 6 mM and 9.2 mM protein concentration. We first perform an exhaustive set of simulations with multiple protein molecules in the simulation box using GROMOS 45a3 and 54a7 force fields together with different types of electrostatics treatment and solution ionic strengths. Surprisingly, the two villin headpiece variants exhibit similar aggregation behavior, despite the fact that their estimated aggregation propensities markedly differ. Importantly, regardless of the simulation protocol applied, wild-type villin headpiece consistently aggregates even under conditions at which it is experimentally known to be soluble. We demonstrate that aggregation is accompanied by a large decrease in the total potential energy, with not only hydrophobic, but also polar residues and backbone contributing substantially. The same effect is directly observed for two other major atomistic force fields (AMBER99SB-ILDN and CHARMM22-CMAP) as well as indirectly shown for additional two (AMBER94, OPLS-AAL), and is possibly due to a general overestimation of the potential energy of protein-protein interactions at the expense of water-water and water-protein interactions. Overall, our results suggest that current MD force fields may distort the picture of protein behavior in biologically relevant crowded environments.

          Author Summary

          Protein behavior is strongly affected by highly crowded and interaction-rich environments, i.e., typical conditions in both biologically relevant systems, such as the cellular interior, and solution-based structural experiments, including NMR and different spectroscopies. On the other hand, primarily because of limited computational power, molecular dynamics (MD) simulations, a premier high-resolution method for analyzing structure, dynamics and interactions of proteins, have been predominantly used to study individual proteins at infinite dilution. To fill this gap, we use MD simulations to study the behavior of wild-type (aggregation-resistant) and oxidatively damaged (aggregation-prone) forms of villin headpiece at high concentration, and reveal unexpected limitations and inaccuracies of modern-day MD force fields when it comes to modeling proteins at physiologically or experimentally relevant concentrations.

          Related collections

          Most cited references 37

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Diffusion, Crowding & Protein Stability in a Dynamic Molecular Model of the Bacterial Cytoplasm

          Introduction While reductionist biophysical studies continue to contribute important insights into the properties and functions of biological macromolecules, research attention is increasingly being directed at uncovering the extent to which behavior observed in vitro is likely to reflect that occurring in vivo [1],[2]. In a physiological setting, all biomolecules must inevitably experience non-specific, unintended interactions with the intracellular milieu and there are good theoretical reasons to expect that, even if such interactions are only steric in nature, significant alterations in macromolecular folding and association equilibria may result [2],[3]. In order to allow macromolecules to be directly interrogated in vivo therefore, a number of important developments have been made in the experimental fields of hydrogen exchange [4], nuclear magnetic resonance [5],[6], and fluorescence spectroscopies [7]–[9]. An alternative to the use of experimental techniques is to assemble a molecular model of an intracellular environment in silico and to use molecular simulation techniques to explore its behavior; if such a model could be shown to be realistic – and that is a big ‘if’ – it would have the important advantage of allowing the simultaneous, direct observation of all molecules in the system. In fact, at least two simulation studies that attempt to model the bacterial cytoplasm have already been reported [10],[11], producing a number of intriguing results. Both of these previous studies, however, modeled all cytoplasmic molecules as spheres and it is perhaps to be anticipated therefore that simulations that include structurally detailed macromolecular models might lead to additional insights. In pursuit of this strategy, we have chosen the gram-negative prokaryote Escherichia coli as a test system, combining quantitative proteomic [12] and high-resolution structural data [13] to build a first structurally detailed molecular model of the bacterial cytoplasm. Results Full details of the construction of the model are provided in Methods. Briefly, however, it is to be noted that the model contains 50 different types of the most abundant macromolecules of the E. coli cytoplasm (accounting for ∼85% of the cytoplasm's characterized protein content by weight; [12]) and a total of 1008 individual molecules. Eight of these molecules are copies of the heterologous (non-E. coli) protein GFP (Green Fluorescent Protein), which has been included so that the diffusional characteristics of the model can be compared with in vivo experimental results (see below). A snapshot of the modeled system, together with a full listing of its constituents, is shown in Figure 1; the total combined macromolecular concentration in all of the simulations reported here is 275g/l. 10.1371/journal.pcbi.1000694.g001 Figure 1 The cytoplasm model. A. Schematic inventory of the contents of the cytoplasm model. B. Rendering of the cytoplasm model at the end of a Brownian dynamics simulation performed with the ‘full’ energy model (see text). RNA is shown as green and yellow. This figure was prepared with VMD [110]. Parameterization of the simulation model Starting from three different randomized initial configurations of the cytoplasm model (all shown in Figure S1), we performed independent Brownian dynamics (BD) simulations [14] to explore diffusive behavior. A variety of energetic descriptions of intermolecular interactions were explored, ranging from a simple steric-only model – which provides an opportunity to directly test the predictions of excluded-volume ‘crowding’ theories [2],[3] – to models that include both long-range electrostatic interactions and short-range potential functions that mimic hydrophobic interactions between exposed non-polar groups. In order to determine the most realistic energy model, the long-time translational diffusion coefficients, DL trans, of the ‘tracer’ GFP molecules were computed from the BD simulations and compared with previously reported experimental estimates obtained by fluorescence-recovery-after-photobleaching (FRAP) analysis of GFP in the E. coli cytoplasm [15]–[18]. A comparison of the computed GFP DL trans values obtained with the different energy models is shown in Figure 2A. For simulations in which only steric interactions operate between macromolecules the computed GFP DL trans value is 3–6 times higher than the experimental estimates, and although this value decreases somewhat when electrostatic interactions between macromolecules are added, it remains 2–5 times too high relative to experiment. A more realistic model of macromolecular interactions would allow favorable short-range attractions to occur between exposed hydrophobic atoms and one simple way of approximating such interactions is to use a Lennard-Jones potential, with the well-depth of the potential, ε, being treated as an adjustable parameter (see Methods). As shown in Figure 2A, the inclusion of such a term results in computed GFP DL trans values that decrease monotonically as the well-depth, ε, increases in magnitude. The best agreement with experiment is obtained with ε = 0.285 kcal/mol: at this value of ε the computed value of DL trans – which is ∼10% of its value at infinite dilution – is within the experimental error of all in vivo estimates [15]–[18] including a very recent report for diffusion in cells growing in minimal media [18]. As noted in the Discussion, this optimal value of ε is very similar to the values obtained in our previous efforts to model the interaction thermodynamics of single-component protein solutions [19]. 10.1371/journal.pcbi.1000694.g002 Figure 2 Parameterization and sampling in the cytoplasm model. A. Extrapolated long-time Dtrans values for GFP from BD simulations performed with different energy models; ‘ε’ refers to the well-depth (in kcal/mol) of the Lennard-Jones potential used to describe hydrophobic interactions (see Methods). Solid, long-dash, short-dash and dotted lines are the experimental Dtrans values from refs. 14, 15, 16 and 17 respectively. The vertical arrow indicates the energy model selected for further BD simulation. B. Average of the maximum distance moved during the 15µs of production for all molecule types plotted versus their molecular weights. Upper error bars indicate the largest value of the maximum distance moved found for any molecule of that type; lower error bars indicate the smallest value of the maximum distance moved. All distances expressed in terms of the molecular diameters (obtained by doubling the hydrodynamic radius calculated by HydroPro [88]. C. Average number of unique neighbors encountered by each molecule type as a function of simulation time; each line refers to a different molecule type. D. Average number of neighbors possessed by each molecule type at any instant, plotted versus molecular weight. E. Time constant for the slower of the two exponentials describing the rate at which neighbors are lost, plotted for each molecule type versus molecular weight. F. Average number of times that each molecule type's immediate neighbors exchange during 15µs simulation plotted versus molecular weight of each molecule type. Having determined that good agreement with experiment could be obtained using a so-called ‘full’ energy model that included steric, electrostatic and short-range attractive hydrophobic interactions, we extended each of three independent simulations performed with this energy model to 20µs (see Figure S2 for plots of the system's energy versus time). In order to provide a useful baseline for comparative purposes we also performed extended simulations with the purely ‘steric’ energy model (i.e. one that neglects the electrostatic and hydrophobic interactions); the latter simulations were performed for simulation times of 17.5µs. Each BD simulation using the ‘full’ energy model required more than a year (clock-time) to complete. For both energy models, snapshots taken from the last 15µs of each simulation were used for detailed analysis. Overall characteristics of the Brownian dynamics simulations An informative, albeit non-quantitative, impression of the simulation behavior can be obtained by viewing movies of the simulations (Supporting Information). In some respects, these movies can be considered a key result of this work: they represent, in effect, dynamic analogs of the highly influential pictorial representations pioneered by Goodsell [20]. Examination of a typical movie obtained from a simulation performed with the ‘steric’ energy model shows the simulated motion to be rapid, chaotic and obviously Brownian. For the more realistic ‘full’ model, on the other hand, motion is somewhat slower-paced, and molecules can be seen to fluctuate between engagement in short-lived associations and periods of relatively free diffusion. We can place these observations on a more quantitative footing, and obtain an indication of the extent of sampling achieved in 15µs of simulation, from the remaining panels of Figure 2. Figure 2B shows the maximum distances moved, on average, by each molecule type during simulations performed with the ‘full’ and ‘steric’ energy models; all distances are expressed relative to the diameter of the diffusing molecule. In the case of GFP with the ‘full’ energy model, for example, each molecule travels, on average, approximately 6 molecular diameters (i.e. 320Å) from its position at the beginning of the simulation. Since the data in Figure 2B are plotted versus molecular weight it is apparent that 15µs of simulation is sufficient for the smaller macromolecules to move very significant distances, while for the largest macromolecules (the 30S and 50S ribosomal subunits), little motion away from the initial position is achieved. On this basis alone, therefore, we expect the estimates of diffusional behavior for the smaller macromolecules to be somewhat more precise than those of the larger macromolecules. A second measure of the extent of sampling achieved during each simulation period is provided by plotting the number of unique interaction partners encountered by each type of macromolecule as a function of the simulation time (Figure 2C). Encouragingly, most molecule types encounter many unique neighbors over the course of 15µs: during a typical simulation with the ‘full’ model, for example, each GFP molecule encounters ∼80 different neighbors. Just as importantly, the total numbers of unique neighbors continues to increase even toward the end of the simulation period: this indicates that the cytoplasm model remains highly dynamic and does not tend to ‘freeze’ as the simulation progresses. As might be expected, the average numbers of neighbors that a macromolecule possesses at any instant scales essentially monotonically with its molecular weight: the average number of macromolecules in the immediate neighborhood of a GFP molecule, for example, is only ∼5 while for the 50S ribosomal subunit it is more than 25 (Figure 2D). The time constants for the dissociation of these neighboring interactions – which in all cases are in the microsecond range – also scale straightforwardly with the molecular weight (Figure 2E), indicating that molecules remain in the neighborhood of larger macromolecules for somewhat longer periods of time than they do with smaller macromolecules. The data shown in Figures 2C and 2D can be combined to provide an estimate of the number of times that each molecule's entire complement of neighbors is replaced during the simulations (Figure 2F). Interestingly, while the overall trend is such that smaller macromolecules encounter a more dynamic constellation of neighbors even the largest macromolecules experience a significant number of environmental changes during the 15µs simulation period. While each GFP molecule, for example, effectively ‘shed its skin’ of neighbors a total of ∼14 times, even the 50S ribosomal subunit undergoes ∼5 such transformations (Fig. 2F). This observation suggests that the limited diffusional exploration carried out by the largest macromolecules evident in Figure 2B may, in at least one important respect, give a misleadingly low indication of the extent of configurational sampling achieved in the simulations: it is in fact, possible for a completely static macromolecule to rapidly encounter widely different microenvironments simply by virtue of the dynamic exchange of its smaller, more mobile neighbors. Translational and rotational diffusion While it was noted above that the long-time DL trans value of GFP obtained with the ‘full’ energy model is in good agreement with in vivo measurements (Figure 2A), there are other aspects of diffusional behavior in the simulations that warrant examination. One question that is of interest is how the observed Dtrans values of macromolecules depend on the observation interval, δt, over which their diffusion is monitored (see Methods). The answer to this question is plotted in Figures 3A and 3B for the three most abundant members of the cytoplasm model (MetE, TufA and CspC); these proteins have been chosen for closer examination because their high abundance yields the most statistically robust numbers, but very similar results are obtained for the other constituents of the model. Figure 3A plots the computed Dtrans values of the three proteins versus δt for both the ‘full’ and ‘steric’ energy models. The clear variation of Dtrans with δt seen for all three proteins is indicative of ‘anomalous’ diffusion [21]–; the magnitude of the anomaly is conventionally expressed by the anomality exponent, α, (Methods) which is plotted for the same proteins, again versus δt, in Figure 3B. Examination of this figure shows that with the ‘steric’ energy model, the diffusion of all three proteins progresses from being normal (α∼1), to transiently subdiffusive (α 100 different types of proteins are represented, and thanks to the availability of the authors' own proteomic data [40], are present in copy numbers that are likely to much more closely reflect their relative abundances in vivo. On the other hand, all macromolecules are treated as spheres, and intermolecular interactions are assumed to be purely steric in nature. In addition, the actual modeling of motion is somewhat simplified: particles take steps of uniform length in randomly chosen directions, with the steps being accepted only if no collision – or reaction – with a neighboring molecule occurs. While somewhat approximate, this approach has the significant advantage of allowing reactive events to be rapidly modeled, making the simulation model applicable to a more general set of problems than that considered here. The resulting model of the cytoplasm was used to investigate the effects of crowding on the translational diffusion of macromolecules and on the rate of the diffusion-limited association of the barnase-barstar protein-protein complex. As noted by the authors, the diffusional simulations produced only a two-fold decrease in the translational diffusion coefficients of GFP-like molecules, suggesting, in common with the results reported here, that (steric) crowding effects alone are insufficient to explain the ∼10-fold slowed diffusion of GFP observed in vivo. Relative to these two previous cytoplasm models, therefore, the present approach offers a significant increase in both structural and energetic complexity: all macromolecules are modeled in atomic detail and interact with one another via an energetic model that accounts for the two major types of interaction that drive protein-protein associations (i.e. electrostatic and hydrophobic interactions). It does so, of course, at very significant computational expense: each of the simulations performed with our ‘full’ energy model required more than a year of clock-time to complete. But even with its associated expense it should not be thought that the present model represents the pinnacle of sophistication in terms of its description of reality. Leaving aside the fact that the model is incomplete in terms of the types of macromolecules (and small molecules) that it includes, there are several key assumptions of the modeling that are both important to stress and which represent obvious candidates to address further in future work. A first simplification of the present approach, and one shared by the previous models described above, is that all macromolecules have here been treated as rigid bodies. This simplification has two consequences. First, it immediately precludes us from making any meaningful attempt to simulate the (presumably very interesting) diffusive behavior of highly flexible macromolecules such as mRNAs and intrinsically unstructured proteins. While this is undoubtedly a limitation, it is to be noted that in terms of their contributions to the overall mass content of the cytoplasm, such molecules play a comparatively minor role relative to that played by the folded, globular macromolecules examined here [10]. It is also to be noted that there are currently very serious technical obstacles to be overcome if the diffusive behavior of flexible macromolecules is to be simulated with any degree of realism: we have shown recently, for example, that the inclusion of hydrodynamic interactions (HI), which are computationally very expensive to compute, is essential if flexible protein models are to adequately reproduce translational and rotational diffusion [41]. A second consequence of the rigidity of the present model is that it is not immediately suited to describing conformational changes that might potentially occur in highly crowded conditions, and for which interesting experimental and simulation results have recently been reported [42],[43]. As shown in the second part of this paper however, this limitation can be overcome, at least for the purposes of calculating thermodynamic effects, by the use of particle-insertion calculations. In fact, the use of such an approach has enabled us to explicitly evaluate the cytoplasm's thermodynamic consequences on both folding and association equilibria, something that would currently be prohibitively expensive to achieve through the direct dynamic simulation of flexible protein models. A second, but not unrelated simplification adopted in the present approach concerns the energy model used to describe intermolecular interactions. On the one hand, the model is comparatively sophisticated in that it includes descriptions of electrostatic and hydrophobic interactions, and models both at an atomic, or near-atomic level of resolution: in this respect it is a clear improvement over previous models used to simulate the cytoplasm. On the other hand, the model assumes that electrostatic desolvation effects can be neglected (which may lead to an overestimation of the strength of electrostatic interactions; [44]) and treats hydrophobic interactions as pairwise additive [45],[46] and of equal strength for aliphatic and aromatic groups. We assume that the effects of these missing features are at least partly subsumed, in an implicit fashion, within our single hydrophobic parameter, ε. For this reason, we should be careful not to attach too much importance to the absolute value of ε found here (0.285 kcal/mol): it is, nevertheless, encouraging that it is very similar to the range of values that we previously obtained [19] when modeling the thermodynamics of simple dilute protein solutions (0.22–0.28 kcal/mol). This is perhaps especially notable given the enormous difference between the protein concentration studied here (275mg/ml) and that studied in the previous work (10mg/ml). In future, it should be possible to increase the sophistication of the energy model without incurring an exorbitant additional computational cost: if one stays with a rigid-body approach, for example, a number of grid-based methods might be used that allow electrostatic desolvation [44] and/or hydrophobic interactions [47]–[50] to be rapidly calculated. It should be remembered, however, that a more complicated functional form will not necessarily lead to better results, and that, at least for now, it is highly likely that some degree of empirical adjustment of energy terms will be required in order to reproduce experimental behavior. This will be especially true if the intention is to use a similar model to explore, for example, macromolecular crowding effects on specific protein-protein interactions: despite significant advances, no current computational method is capable of accurately predicting the strength or geometry of specific protein-protein interactions with any generality [51]. To model such situations, therefore, it may be necessary to supplement the energy model with additional short-range forces to drive the formation of known intermolecular contacts, in the same way that such terms (commonly known as Gō-potentials; [52]–[54]) are often used in the modeling of protein folding processes; an alternative might simply be to use different ε values for different protein-protein interactions. A third limitation of the present model concerns its very simplified description of macromolecular hydrodynamics. In particular, while the basic hydrodynamic properties of all macromolecules (i.e. their translational and rotational diffusion coefficients at infinite dilution) are properly accounted for, the BD simulations reported here do not allow for the presence of hydrodynamic interactions (HI) between macromolecules; again this is true also of the two previously reported cytoplasm models [10],[11]. The immense expense associated with HI calculations remains a major stumbling block to their inclusion in large-scale simulations [55] and a number of attempts have therefore been made to accelerate their computation (see, e.g. [56],[57] for very recent and potentially important examples). This expense would be further increased in the present case if, as would in principle be necessary, an Ewald summation technique was used to properly account for HI in periodic boundary conditions [58]. While simply stating that HI are expensive to calculate does not constitute a compelling reason for leaving them out of the simulations, it is pertinent to note that the omission of HI seems unlikely to be the cause of the gross overestimation of the diffusion coefficient of GFP obtained with the ‘steric’ energy model (Figure 2A). It is certainly true, as noted elsewhere [18], that for hard-sphere-like colloidal particles – where the interactions between particles are extremely short-range – theoretical work has established that the inclusion of HI should cause decreases in Dtrans values over both short [59] and long timescales [60],[61]. Such decreases are, however, unlikely to bridge the ∼5-fold gap necessary to bring the ‘steric’ energy model behavior into quantitative agreement with experiment: in an interesting recent simulation study, for example, it was found that an approximate description of HI in crowded hard-sphere solutions resulted in only a ∼40% additional decrease in the diffusion coefficient relative to simulations without any description of HI [62]. In addition, it is also to be noted that for colloidal particles with long-range repulsive electrostatic interactions, theory indicates that the inclusion of HI causes modest increases in Dtrans values at both short [63],[64] and long timescales [64],[65]. Since the current model has macromolecules interacting with each other not only by short-range steric forces and long-range repulsive electrostatic forces, but also by short-range attractive interactions between exposed hydrophobic residues it is difficult to predict the effects that the inclusion of HI might ultimately cause, other than to say that we think they may be comparatively modest. In keeping with the caveat given above about our energy model, however, we clearly must leave open the possibility that the hydrophobic parameter, ε, is also, in part, serving as an implicit correction for the omission of HI from the simulations. Having produced in the preceding paragraphs a litany of shortcomings of the model one might be tempted to view it as so fundamentally limited that its practical utility is in doubt. Perhaps the strongest argument against such a view comes from the results of the particle-insertion calculations aimed at computing the thermodynamics of protein folding in vivo (Figure 4A). It is important to note that these thermodynamic calculations should be considered bona fide predictions of the simulation model since it was calibrated to reproduce a quite different experimental observable, i.e. the translational diffusion coefficient of GFP. Because of this, we can rule out the possibility that the calibration of the model predisposes it to trivially reproduce experimental protein stability effects. To our knowledge, the calculated results reported here with our ‘full’ energy model are the first to provide a quantitative rationalization of the experimental observation that CRABP is destabilized in vivo (relative to in vitro) and that λ6-85's relative stability is essentially unchanged. As noted earlier, the experimental CRABP result is inexplicable with conventional macromolecular crowding theory (as exemplified by the results obtained here when the ‘steric’ energy model is used in the particle-insertion calculations) since the dimensions of its unfolded state are greater than those of its native state. Use of the ‘full’ energy model, on the other hand, produces results in close agreement with experiment because it explicitly allows for the two states of the protein to engage in differential, favorable energetic interactions with the rest of the constituents of the cytoplasm. Interestingly, good results are obtained when the ‘full’ energy model is used in the particle-insertion calculations regardless of whether the cytoplasm snapshots were sampled from the ‘steric’ BD simulations or sampled from the ‘full’ BD simulations. Although the most internally consistent approach is obviously to use the same energy model in both the BD simulations and the particle-insertion calculations, the fact that good results can apparently also be obtained using snapshots from the ‘steric’ BD simulations is intriguing since such simulations are much faster to conduct than those using the ‘full’ energy model. Our model's predicted effects on the folding free energies of the six other proteins investigated (Figure 4D) await experimental testing of course, but regardless of how quantitatively accurate such predictions might eventually turn out to be we feel reasonably confident in suggesting that future attempts to understand a protein's folding thermodynamics in vivo will need to describe its interactions with the cytoplasm with more realism than is provided by simple steric interactions. Other findings from the simulations, while probably more difficult to directly test experimentally, provide examples of the kinds of new information that can be obtained from simulation approaches that attempt to model intracellular environments. Examples include the observation that the immediate neighbors of individual proteins exchange rapidly on a microsecond timescale – even for the very largest macromolecules – and that diffusion is transiently anomalous even on a sub-nanosecond timescale. The latter observation is especially interesting given the current interest in anomalous subdiffusion as an efficient mechanism of search and association in physiological situations [8],[66]. Finally, one might also point to the fact that the simulation model correctly reproduces the cytoplasm's relative translational and rotational viscosities as an important favorable result since differential effects on translational and rotational motion appear to have interesting effects on protein-protein association rates in crowded solutions [67]–[69]. It should be remembered, however, that a similarly good reproduction of the relative translational and rotational viscosities is also obtained with the otherwise poorly performing ‘steric’ energy model. An examination of all of the dynamic and thermodynamic results described above shows, we think, that it is possible to leverage the existing structural biology and quantitative proteomic data to produce a meaningful, working molecular model of the bacterial cytoplasm. The actual simulation model used here has a number of limitations, of course, but continuing increases in computer power and/or the development of faster simulation methodologies, will likely allow many of these drawbacks to be eliminated in the not too distant future. Given the continuing progress in the fields of structural biology and quantitative proteomics it is likely that the same basic approach might be used to model other intracellular environments. Methods Selection of the constituents for the cytoplasm model When this work was initiated, the only large-scale quantitative study of the E. coli proteome was that reported by Link et al. [12] who experimentally measured levels of >200 of the most abundant proteins present in E. coli. A number of other quantitative proteomic studies of E. coli have since been reported [40],[70],[71], and, since this work was completed, quantitative estimates of metabolite concentrations have also become available [72]. Restrictions on computer memory (4GB of RAM for all servers used) meant that the total number of different types of macromolecules that could be realistically modeled was limited to 51: these would be 50 types of E. coli macromolecule plus the Green Fluorescent Protein (GFP). Although including only 50 different types of macromolecules means that the model underestimates the structural diversity of the cytoplasm, it is important to note that the macromolecules selected for inclusion account for 85% (by number of protein chains) and 86% (by mass) of all the cytoplasmic proteins quantified and identified in Table 4 of Link et al. [12]. Of the 50 types of E. coli macromolecules to be included in the model, 45 would be proteins. These were selected by working down the list identified by Link et al. in order of decreasing abundance, selecting only those proteins (a) for which high-resolution structures were then available in the Protein Data Bank [13] (PDB) or for which reasonable homology models could be constructed (see below), and (b) for which the cytoplasm was unambiguously identified as the major cellular location in the EcoCyc [73] and/or CCDB [74] databases. A full list of all potentially cytoplasmic proteins identified and quantified in Table 4 of Link et al. (under minimal media conditions), arranged in decreasing order of chain-abundance, is shown in Table S1; asterisks in the columns headed ‘Mod.’ denote those proteins included in our cytoplasm model. It is an indication of the tremendous coverage of the structural proteome that has been achieved by the structural biology community that we were able to obtain, or build, reasonable structural models for all of the 30 most abundant cytoplasmic proteins identified by Link et al. [12]. In addition to the 45 different types of proteins, 5 types of macromolecule were RNAs or RNA-protein complexes: these were the two ribosomal subunits (50S and 30S), and three typical tRNAs for which structures were available: (tRNA-Gln, tRNA-Phe and tRNA-Cys). It is to be noted that we did not model complete (translating) 70S ribosomes owing (a) to the inherent difficulties in modeling the flexible mRNA, and (b) to the absence – at the time this work was begun – of a three-dimensional structure showing the arrangement of multiple 70S ribosomes in a polyribosome [75]. The total number of molecules in the simulations was set to 1008 (eight copies of GFP and 1000 E. coli macromolecules). This number was chosen so that the eventual assembled cytoplasm model would be large enough to provide a reasonable representation of the environment while still allowing simulations of up to 20µs to be performed (albeit over the course of more than a year clock-time). The linear dimensions of the final modeled system (808.4Å in each of the x, y and z directions) correspond to approximately one-twelfth of the diameter of a typical E. coli cell [76]. A summary of the macromolecules selected, their subunit compositions, the PDB codes of their originating structures, and the degree of sequence coverage achieved by the structural models, is presented in Table S2. Using composition estimates provided by Neidhardt et al. [76] as a guide, we set the total concentration of macromolecules in the model (excluding the ‘tracer’ GFP) to 275 g/l; this is slightly on the low side of the rough values of 300–340 g/l estimated independently by Zimmerman and Trach [77]. Of this, 55g/l (i.e. 20% of the total) is contributed by RNA, with 15% of the RNA dry weight contribution being made by tRNA and the remainder being made by ribosomal RNA [76]. mRNA, which accounts for only ∼4% of the total dry weight of RNA in the cell, is omitted from the present model. The remaining 219g/l (i.e. 80%) of the model is contributed by proteins; this percentage is deliberately set somewhat higher than the 55% contribution to the dry weight of the entire cell estimated by Neidhardt et al. [76] in order to compensate for the missing volume of components that are not explicitly represented in the model (DNA, mRNA, lipid, lipopolysaccharide, murein, and glycogen). If one takes the specific volumes of proteins and RNA to be 0.73ml/g and 0.58ml/g respectively [77], the total volume fraction occupied by macromolecules in the model is 0.19; if instead, an ‘effective’ specific volume of macromolecules suggested by Zimmerman and Trach is used [77] (1.0ml/g), the total volume fraction occupied by the macromolecules in the model amounts to 0.27. Preparation of the macromolecular structures for simulation Structures for all selected proteins were identified by performing a BLAST search [78] of the protein's FASTA sequence (as reported in the EcoCyc database) against the entire PDB and selecting the structure with the closest identity to the query sequence using the program BioEdit [79]. The quaternary structure of each selected structure was determined using the PQS web server [80] and was verified, where possible, with the EcoCyc database; it should be noted that correct identification of a protein's quaternary structure is a non-trivial undertaking, and the PQS predictions are unlikely to be 100% reliable [80],[81]. Homology modeling was used for all proteins for which either no E. coli structure was directly available in the PDB, or for which a significantly greater coverage of the sequence could be obtained through the use of a non-E. coli structure. All homology modeling was performed using the SWISS-MODEL web server [82] via the so-called “First Approach mode”; for oligomeric proteins each individual chain was homology-modeled independently. Any sidechains missing from a structure were built in using the molecular modeling program WHATIF [83]. Hydrogens were then added, and partial charges and radii were assigned to atoms using the program PDB2PQR [84]. For proteins, partial charges and atomic radii were taken directly from the PARSE parameter set [85]. For nucleic acids, which are not represented in the PARSE parameter set, partial charges were instead assigned from the CHARMM23 parameter set [86]; partial charges for the modified bases of tRNAs, such as pseudouridine, were assigned based on similarity to functional groups already represented in the parameter sets. The protonation states of all protein ionizable residues were assigned using the fast empirical algorithm PropKa [87]; for these calculations, the pH was set to 7.6, the estimated pH of the E. coli cytoplasm [76]. With each structure complete, infinite-dilution translational and rotational diffusion coefficients – which are necessary input parameters for BD simulations [14] – were calculated with the program HYDROPRO [88] using default parameters. For the latter calculations we assumed a solvent viscosity, η, of 0.89cP, which corresponds to the viscosity of pure water at 25°C; given that the most recent estimate of the total metabolite concentration in the E. coli cytoplasm is ∼300mM [72] we do not anticipate, based on what we currently know, that the viscosity of the solvent environment will be hugely altered from the pure water value. The final stage of preparation for each molecule involved the calculation of electrostatic potential grids; these were computed in all cases by using the APBS software [89] to solve the non-linear Poisson-Boltzmann (PB) equation [90]. As in our previous BD study of single-component protein solutions [19], two cubic electrostatic potential grids were computed for each type of macromolecule: (a) a comparatively fine grid, of spacing 2Å, with dimensions sufficient to encompass a 20Å shell around the macromolecular surface, and (b) a coarse, long-range grid, of spacing 4Å, that extends at least 50% further in each direction than the smaller grid. The use of a 2Å grid spacing for the higher resolution grids, rather than the 1Å grid spacing used in our previous simulations [19], was necessary in order to fit all potential grids into the available 4GB of RAM. This spacing is, however, sufficiently detailed that at least two grid points always intervene between interacting atoms even when they are at the closest possible separation distance (4.5Å); significant numerical instabilities in the calculated electrostatic forces do not, therefore, arise. In all PB calculations the solvent dielectric was set to 78.0 and the internal dielectric of the macromolecule was set to 12.0, with the boundary between the two being determined by the cubic-spline surface [91] implemented in APBS [89]. Use of an internal dielectric of 12.0 is intended to provide a simple, averaged description of the different dielectric responses of macromolecular interiors and exteriors [19],[92],[93]. The ionic strength in all PB calculations was set to 150mM. With the electrostatic potentials in hand, ‘effective charges’ were computed for each molecule type using the procedure established by Gabdoulline & Wade [94],[95]. Finally, as in our previous work [19], simulations were accelerated by retaining, in addition to the effective charges, only those non-hydrogen atoms that were solvent-exposed: these atoms were identified using the ACC tool within APBS [89], with a 4Å solvent probe. Brownian dynamics simulation protocol The BD software used for the simulations is an extension of the methodology developed and tested in our previous work on pure protein solutions [19]. Modifications were made to the software to improve memory usage so that 102 electrostatic potential grids could be simultaneously held in memory; in addition, toward the end of this study, loop-level parallelization of a number of key loops was implemented with OpenMP (http://www.openmp.org) to accelerate computations by a factor of ∼4. All simulations were performed under periodic boundary conditions [96] in a cubic cell with edges of 808.4Å. The initial configuration of each system had eight GFP molecules evenly positioned at the center of the eight octants of the cell; all other macromolecules were initially positioned by performing random translations and rotations within the cell subject to the requirement that there was at least a 10Å separation between the surfaces of all neighbors. Three independent configurations were generated in this way by use of different random seeds; views of each system before and after 15µs of simulation are shown in Fig. S1. As in our previous work, BD simulations were conducted using the Ermak-McCammon algorithm [97] with a time step of 2.5ps, with additional algorithmic measures being taken to ensure that no atom-atom distances at the completion of each timestep were less than 4.5Å. For subsequent analysis of the simulations, the 3D translational vector and the 3×3 rotational matrix necessary to specify the position of each macromolecule were recorded every 100ps. The form of the energy model used to describe intermolecular interactions was identical to that used in our previous work [19]: the effective charge method [94] was used to calculate electrostatic interactions, and a Lennard-Jones potential (comprising 1/r12 and 1/r6 terms) was used to provide a simple combined description of steric, van der Waals and hydrophobic interactions. To accelerate the simulations, the combined non-electrostatic interactions were computed only between atom pairs separated by less than 12Å; a list of all such pairs was continually updated every 40 timesteps (i.e. every 100ps). As in our previous work, we treated the strength of these non-electrostatic interactions, which are determined by the well-depth, εLJ, of the Lennard-Jones potential, as the only adjustable parameter of the model. In order to determine the best setting, three independent BD simulations of at least 6µs duration were performed with each of the following εLJ values: 0.190, 0.285, 0.3325 and 0.380 kcal/mol. Finally, for comparison purposes, two additional sets of three BD simulations were also performed: these were (a) simulations in which the only the repulsive (1/r12-dependent) steric interactions operated (these are the ‘steric’ simulations discussed in the main text) and (b) simulations in which only steric plus electrostatic interactions acted. Analysis – translational diffusion coefficients The effective translational diffusion coefficients, Dtrans, of molecules were calculated from the simulations using the Einstein equation: (1) where is the mean-squared distance traveled by the molecular center of mass in the observation interval, δt; all Dtrans values reported in Results are mean values for each molecule type averaged over the number of copies of each type. In cases of ‘normal’ diffusion, the computed Dtrans values are independent of δt; in certain cases of diffusion in vivo and in vitro however, anomalous sub-diffusion is observed [8], [21]–[23],[66]; in such cases, the apparent Dtrans value is dependent on δt, decreasing with increasing δt. A common way of describing anomalous diffusion involves writing it in the form: (2) where the apparent translational diffusion coefficient Dtrans is now written to indicate that it depends on the observation interval and α is the so-called anomalous diffusion (anomality) exponent; α = 1 corresponds to normal diffusion since it leads to Dtrans being independent of δt, and α<1 indicates anomalous (sub)diffusion. Taking logarithms and differentiating with respect to log (δt) allows us to write: (3) This enables us to obtain α by numerically differentiating Dtrans values computed over a range of δt values; in practice we computed Dtrans at δt values of 100, 200, 300, 600, 1000, … ps, and obtained α at the logarithmic mid-point, δtmid, of these time-intervals, δtmid = 141, 245, 424, … ps. Plots of α versus log (δtmid) for macromolecules simulated with both the ‘steric’ and ‘full’ energy models all indicated that α itself was dependent on δtmid, thus signifying that diffusion was transiently anomalous. To our knowledge, there is no explicitly derived functional form that describes the expected dependence of α on δt for transient anomalous diffusion. We found however that the data fit well to the following empirical functional form (see Fig. 3B): (4) where α0 is a constant, a and b are parameters that describe the amplitude of the δt-dependent changes to α, and τshort and τlong are, respectively, the timescales over which α first decreases, and then returns to one, with increasing δt. Plots of α versus δt for all molecule types were fit to the above functional form with SigmaPlot [98]: fits were performed using all datapoints from the shortest δtmid value up to the first datapoint that had a percent error exceeding ∼25% (obtained by comparing the α values computed from the three independent BD simulations), or that deviated qualitatively from the trend. To ensure that the latter criterion did not drastically affect the results, the fits were repeated retaining even those datapoints that qualitatively deviated; essentially the same behavior was obtained but with slightly greater values of τlong. Regressed values of τshort and τlong are plotted versus molecular weight for all molecule types in Figs. S4 and S5 respectively. Having fit a function to the observed dependence of α on δt, it was numerically integrated to obtain an extrapolated, asymptotic long-time Dtrans value using the Dtrans value at δt = 100ps as the starting point for the integration. The quality of fits of the integrated Dtrans values (for the most abundant proteins) is indicated by the solid lines in Fig. 3A. Analysis – rotational diffusion coefficients Effective rotational diffusion coefficients were computed from the time-dependent behavior of the 3×3 rotational matrix recorded every 100ps for every molecule during the simulations. For each of the three rotational axes, an autocorrelation function, θ (δt), was calculated as: (5) where e (0) and e (δt) are unit vectors pointing along one of the rotational axes at time t = 0 and t = δt respectively, and the brackets indicate an average over all possible initial timepoints; the three computed autocorrelation functions were averaged to give a single decay function consistent with the isotropic rotation that we assumed for all molecule types at infinite dilution. Since the resulting averaged autocorrelation function for the ‘full’ energy model did not fit well to a single-exponential decay, and given that translational diffusion was clearly transiently anomalous, we decided to use the following functional form proposed recently for transiently anomalous rotational diffusion [27]: (6) where θ0 is the value of the autocorrelation function at δt = 0 (always 1), a is a parameter, τrot is a long-time rotational correlation time (which dominates as δt→∞), and τrel is the timescale over which a faster, short-time rotational relaxation gives way to the slower rotation characterized by τrot. The above functional form was fit to computed values of θ for each molecule type over a range of δt values up to 1µs; the r2 values for these fits were all in excess of 0.999. An example of such fits for the most abundant proteins is shown in Fig. S6. The long-time rotational diffusion coefficient, DL rot, is then obtained using the relationship: (7) and the short-time rotational diffusion coefficient, DS rot, is obtained from [27]: (8) The computed ratios DL rot/D0 rot and DS rot/D0 rot obtained with the ‘full’ energy model are plotted for all molecule types versus their molecular weights in Fig. S7; a plot of the parameter a versus molecular weight shows no obvious relationship (not shown). Analysis – literature estimates of relative translational and rotational viscosities Comparison of the simulated translational and rotational diffusion coefficients with the infinite-dilution values that are input parameters for the simulations provides an indication of the relative viscosities experienced during the two types of motion. From studies of GFP diffusion in Chinese hamster ovary cells, the Verkman group reports [29] a relative viscosity experienced by translational motion, ηrel T = 3.2±0.2, and a relative viscosity experienced by rotational motion, ηrel R = 1.5±0.1. Combining these numbers gives a ratio, ηrel T/ηrel R of 2.1±0.3, indicating that the effective relative viscosity experienced by translational motion is roughly twice that experienced by rotational motion in mammalian cells. A second estimate of the ηrel T/ηrel R ratio can be obtained from the work of Zorrilla et al. [28],[99]: these authors have reported measurements of the translational diffusion coefficients of apomyoglobin (17kDa) using fluorescence correlation spectroscopy (FCS) measurements [28] and have compared them with rotational diffusion coefficients that they had previously measured [99] for the same system using time-resolved fluorescence depolarization experiments. They report measurements for two different background proteins, RNaseA and human serum albumin (HSA); we focus on the data reported for the latter since its molecular weight (67kDa) is much closer to the number-averaged molecular weight of the macromolecules in our cytoplasm model (87kDa), than is the molecular weight of RNaseA (14kDa). The data reported by Zorrilla et al. are expressed relative to the macroscopic viscosity, ηm, of the protein solution (measured with an Ostwald viscometer). They report that ηm fits to the following functional form, ηm = η0 exp (Ac/(1−Bc)), where η0 is the viscosity of pure water, c is the background protein's concentration in mg/ml, and A and B are background-dependent constants: A = 2.7×10−3 ml/mg and B = 1.3×10−3 ml/mg for HSA [99]. Using these values we obtain a macroscopic viscosity for a 275 mg/ml HSA solution of 3.155 η0. Using the data given in Table 2 of ref. 49, the effective viscosity experienced by the translational motion of apomyoglobin in HSA is expressed as ηrel T = (ηm/η0)1.28, which from above means that we can write ηrel T = 3.1551.28 = 4.35; following similar calculations the effective viscosity experienced by the rotational motion is ηrel R = (ηm/η0)0.44 = 3.1550.44 = 1.66. Together, these numbers translate into a value of ηrel T/ηrel R of 2.6±0.2. As noted in the main text, we find that both the translational and rotational diffusion coefficients of molecules vary with the time interval, δt, over which diffusion is observed. While the observation of this transient anomalous diffusion is significant in its own right it takes on added significance when comparing the relative viscosities experienced by translational and rotational motion. This is because the timescales over which the two types of experiments are conducted are quite different: translational diffusion coefficients are obtained from FCS experiments by fitting to an autocorrelation function over a timescale extending from microseconds to seconds [21],[22],[66] while rotational diffusion coefficients are obtained from fits to data obtained over a nanosecond timescale [28],[29]. We therefore compare the experimentally derived relative viscosities quoted above with diffusion coefficients computed from the BD simulations on the same timescales, i.e. we compare with the ratio of the long-time translational diffusion coefficient DL trans and the short-time rotational diffusion coefficient, DS rot (see Fig. 3F). Analysis – monitoring of intermolecular contacts The intermolecular contacts engaged in by each molecule were recorded every 100ps during the BD simulations and subsequently analyzed to determine: (a) the average number of neighbors of each molecule type at any given time, (b) the number of unique neighbors encountered by each molecule type during the course of the entire simulations, and (c) the rate of dissociation of intermolecular interactions. The definition of ‘neighbor’ was kept somewhat loose in order to detect all molecules in the immediate environment of the molecule being probed: molecules were assigned as neighbors if any of their atoms were within ∼12Å of each other. The rates at which the neighbors of a particular molecule dissociated were obtained from plots of the fraction of its neighbors, initially present at t = 0, that remained after some time t = δt, averaged over all possible initial timepoints. In order to obtain the characteristic neighbor-decay rate for each particular type of molecule, such plots were averaged over all molecules of that type. The resulting plots are found to follow biexponential kinetics: (a) a very fast decay process (τfast) that typically has an amplitude of ∼0.7 and is due to loss of neighbors that interact only peripherally with the molecule of interest, and (b) a slower decay process (τslow) that has an average amplitude of ∼0.3 and is due to loss of those neighbors that form bona fide intermolecular contacts. Typical fits for these data are shown in Fig. S8. Method for calculating thermodynamics in the cytoplasm The effects of immersion in the cytoplasm on the thermodynamics of protein folding and protein-protein association were computed using the particle insertion technique first outlined by Widom [30]. For small perturbations, the free energy change, ΔG, for transferring a molecule from an environment free of any interacting macromolecules to the cytoplasm environment can be rigorously expressed as: (9) where Eint is the interaction energy of the molecule with the constituents of the cytoplasm, R is the Gas constant, T is the temperature, and the brackets indicate an average over randomly selected insertion positions and configurations of the cytoplasm environment. In order to assess the likely effects of the cytoplasm on a thermodynamic process (such as protein folding) therefore, separate particle-insertion calculations are required for both the initial state (e.g. unfolded protein) and the final state (e.g. folded protein). Such calculations give the free energy changes for the vertical processes in the thermodynamic cycle shown below: (10) Since free energy is a state function, the difference between the free energy changes of the horizontal processes is equal to the difference between the free energy changes of the vertical processes. We can therefore write the difference between the free energy change for the process in vivo and in vitro, ΔΔG, as: (11) The effect of the cytoplasm on the free energy change for a process can therefore be calculated without needing to know the actual value of the free energy change for the process in vitro. A conceptually similar but different approach to computing thermodynamics in crowded solutions has recently been outlined by Zhou and co-workers [100]. Code for performing particle-insertion calculations was generated by modifying the existing BD simulation program; prior to performing large-scale explorations of protein folding and association thermodynamics, the code's correctness was first checked by comparing its predictions for the free energy cost of placing a sphere into a solution of spheres with the corresponding predictions of scaled particle theory [101],[102]. Cytoplasm effects on protein folding equilibria Calculations of the cytoplasm's thermodynamic effects initially focused on protein folding equilibria. In addition to calculating the folding thermodynamics of six proteins already present in the cytoplasm model (Adk, Bcp, CspC, Efp, GFP and PpiB), we examined two other proteins that have been subject to direct experimental study in vivo: these were the 80-residue λ6-85 construct studied experimentally by Ghaemmaghami and Oas [4] and the 136-residue cellular retinoic acid binding protein (CRABP) investigated by Ignatova, Gierasch and co-workers [7],[32]. The structure of the folded state of λ6-85 was taken from its crystal structure in complex with operator DNA (pdbcode: 1LMB [103]); the G46A & G48A mutations present in the experimental construct were made using the rotamer-sampling method SCWRL3 [104]. The structure of the folded state of CRABP (pdbcode: 1CBI [105]) was altered to include the R131Q mutation used in the experimental construct [7], but in the absence of direct structural information no attempt was made to model the experimentally-incorporated fluorophore. The unfolded states of all eight proteins were modeled as ensembles of 1000 unfolded conformations generated using the conformational sampling method developed by the Sosnick group [31]; the code was kindly made available by Dr. Abhishek Jha. This method has been shown to produce models with dimensions in good agreement with experimental estimates [31]. Prior to calculations, the structures of all conformations were completed by adding sidechains with SCWRL3 [104] and by adding hydrogens with the PDBTOPQR utility [84] of APBS [89]. In order to ensure consistency between the BD simulations and the Widom particle-insertion calculations, effective charges and electrostatic potential grids were calculated for all conformations (both folded and unfolded) using the exact same protocol employed with the rigid protein models of the cytoplasm model (see above). For each protein, a large number of random trial positions were attempted with both the single, folded state structure and the 1000 unfolded state conformations; each trial consisted of a different randomly selected translation and rotation. For the folded state structure, a total of 25 million trials were attempted; for the unfolded state, 250,000 trials were attempted for each of the 1000 conformations (to give a total of 250 million trials for each cytoplasm ‘snapshot’ studied). For each trial position, the interaction energy of the protein with the surrounding cytoplasm was calculated with (a) the ‘full’ energetic model, which includes electrostatic, steric and hydrophobic contributions, and (b) the ‘steric’ energetic model. To simplify the latter calculations, only two possible energies were allowed: the interaction energy, Eint, was set to +∞ if any of the protein's atoms came within 4.5Å of any of the cytoplasm atoms, and was set to zero if not; this binary scoring method is effectively identical to that used in most examinations of excluded-volume (crowding) effects. Due to the very significant computational expense associated with the particle-insertion calculations, they were applied only to the final ‘snapshot’ of the three independent BD simulations performed with the ‘full’ and ‘steric’ models. Error bars for all reported free energy changes were therefore calculated as the standard deviation of the computed values obtained from the three different system ‘snapshots’. The total number of unfolded and folded-state trial positions that were accepted and rejected for each protein, for each of the three ‘full’ model cytoplasm ‘snapshots’ are listed in Table S3. Cytoplasm effects on protein association equilibria A very similar protocol was used to calculate the effects of the cytoplasm on a variety of protein association reactions. Calculations on each assembled protein complex were performed exactly as described above. Calculations on each disassembled complex – e.g. two separated protein monomers in the case of a dimerization reaction – were carried out by performing insertions of all components simultaneously; importantly, each randomized placement was first screened to ensure that there were no steric clashes between any of the inserted components before their interactions with the cytoplasm were evaluated. As might be expected, the requirement of simultaneously placing multiple molecules into the cytoplasm meant that in some cases very large numbers of trial positions were required in order to obtain reasonably converged results. Owing to the significant computational expense, therefore, calculations were only performed on snapshots taken from BD simulations performed with the ‘full’ energy model. In addition, since the Boltzmann-weighting of the sampled interaction energies can contribute significant noise in cases where the number of accepted placements are comparatively low, the cytoplasm-interaction energy distributions were first smoothed by fitting to sums of three Gaussians using SigmaPlot [98] (see Fig. S9 for a typical fit). The total numbers of accepted and attempted insertions for the various association reactions studied are listed in Table S4. Dimerization equilibria were investigated by performing separate particle-insertion calculations on the dimeric forms and the monomeric forms; for such calculations it was assumed that no structural change (e.g. unfolding) occurs when the two monomers are separated. The trimerization equilibrium of ParM was investigated in analogous fashion, by performing calculations on a trimer extracted from the ParM filament model (pdbcode: 2QU4 [106]). The aggregation of a poly-Q-inserted RNaseA to form an amyloid fiber was studied using the theoretical model developed by Eisenberg and co-workers (pdbcode: 2APU; [38]). The model deposited in the PDB contains 56 aggregated monomeric units; the largest aggregate for which we could obtain reasonably precise free energy estimates however contained eight monomeric units (Fig. 4F). Since formation of the amyloid structure involves a significant change in conformation, the use of monomeric structures extracted without modification from the aggregate model would be inappropriate. Instead, the structure of the monomeric poly-Q-inserted RNaseA was taken from the crystal structure reported by the Eisenberg group (pdbcode: 2APQ [38]). In order to ensure sequence-consistency with the amyloid model, a A131H mutation was made with SCWRL3 [104]. In addition, since the monomeric structure has no resolved coordinates for the inserted GQQQQQQQQQQGNP stretch this region was model-built using the loop-building program Loopy [107]. The second aggregate structure studied was a theoretical model of SH3 domain aggregation proposed by the Shakhnovich group [39] and kindly made available to the authors by Dr. Feng Ding (UNC; personal communication). This structure contains only Cα atoms so complete backbone coordinates were first constructed using the SABBAC webserver [108] (http://bioserv.rpbs.jussieu.fr/cgi-bin/SABBAC) before sidechain positions were constructed using SCWRL3. Owing to the structure's origins being a Cα-only model we were unable to add sidechains in such a way that the assembled aggregate model was free of internal steric clashes; this, however, does not significantly affect our ability to estimate the model's interaction with the cytoplasm environment. As with the RNaseA amyloid model, it would be inappropriate to assume that the conformations of unaggregated monomeric units are identical to those found in the amyloid model; instead therefore the conformation of the monomeric SH3 domain was taken from the crystal structure (pdbcode: 1NLO [109]). Two movies, each showing 1.8µs of simulation, are provided as separate Quicktime .mov files. Video S1 shows a BD simulation performed with the ‘full’ energy model; Video S2 shows a BD simulation performed with the ‘steric’ energy model. File size restrictions at the PLoS website have limited the size and resolution of the uploaded movies to be used for review. Higher resolution movies are available to readers at the authors' website: http://dadiddly.biochem.uiowa.edu/Elcock_Lab/Movies.html. Supporting Information Figure S1 Views of the three independent system setups before and after 15µs of BD simulation with the ‘full’ energy model. 50S and 30S ribosomal subunits can be identified by the green/yellow of their RNA and the blue and red (respectively) of their proteins. This figure was prepared with VMD [110]. (3.10 MB TIF) Click here for additional data file. Figure S2 Total system energy and its electrostatic and hydrophobic components, plotted versus simulation time; the vertical dashed line indicates the beginning of the production simulation. (0.13 MB TIF) Click here for additional data file. Figure S3 Histogram of cytoplasm-interaction energies, Eint, obtained for all non-clashing insertions of the aggregated and non-aggregated states of the SH3 domain. (0.11 MB TIF) Click here for additional data file. Figure S4 Time constant for the exponential describing the descent to the minimal value of the anomality exponent, α, plotted for all molecule types versus molecular weight. (0.09 MB TIF) Click here for additional data file. Figure S5 Time constant for the exponential describing the return to normal rotational diffusion plotted for all molecule types versus molecular weight; note that for the ‘steric’ model rotational diffusion is essentially normal at almost all observation intervals examined. (0.09 MB TIF) Click here for additional data file. Figure S6 Plot showing the quality of fit of a two-exponential decay function to the autocorrelation function describing rotational motion for the three most abundant proteins in the model. Symbols indicate the simulation data; lines indicate the two-exponential fit. (0.10 MB TIF) Click here for additional data file. Figure S7 Ratio of the short-time and long-time rotational diffusion coefficients to the infinite-dilution value plotted for the ‘full’ model for all molecule types versus molecular weight. (0.09 MB TIF) Click here for additional data file. Figure S8 Plot showing the quality of fit of a two-exponential decay function to the function describing the loss of neighbors for five selected molecule types. Symbols indicate the simulation data; lines indicate the two-exponential fit (0.10 MB TIF) Click here for additional data file. Figure S9 Plot showing the quality of fit of a 3-Gaussian distribution to the cytoplasm-interaction energy distributions obtained for non-clashing insertions of the IcdA protein in dimeric and monomeric states; note that the y-axis is on a logarithmic scale. (0.11 MB TIF) Click here for additional data file. Table S1 Ordered list of all those proteins identified and quantified in Table 4 of Link et al. [12] under minimal medium conditions and for which the cellular location is either clearly cytoplasmic or undetermined. ‘N-abd’ is the cellular abundance of each chain of the protein determined by Link et al. ‘MW’ is the molecular weight of each chain of the protein as estimated from the amino acid sequence in the Ecocyc database [73]. Asterisks in the ‘Mod.’ column identify those proteins present in our cytoplasm model; note that the low-abundant proteins SucC and RplC are included in the model because they are components of more abundant protein complexes. (0.25 MB RTF) Click here for additional data file. Table S2 Alphabetically-ordered list of the macromolecules present in our cytoplasm model showing the pdbcode of their originating structures, the infinite-dilution translational and rotational diffusion coefficients [88], and the sequence coverage of each model. (1.25 MB PDF) Click here for additional data file. Table S3 Details of the particle-insertion calculations of the folding equilibria of 8 different proteins, listed in order of increasing protein chain length. Results are shown only for insertions into ‘snapshots’ (A, B, C) taken from BD simulations performed with the ‘full’ energy model. The total numbers of attempted insertions for the folded and unfolded states (for each ‘snapshot’) are 25 million and 250 million respectively. ΔGWidom and ΔΔG are insertion free energies obtained using the ‘steric’ energy model: these numbers can be obtained directly from knowledge of the number of attempted and successful insertions listed in this table. (0.10 MB RTF) Click here for additional data file. Table S4 Details of the particle-insertion calculations of the association equilibria of 14 different proteins. ‘Process’ refers to the stoichiometry of the association process examined: 1→2 denotes that the equilibrium is between two monomers and one dimer, 4→8 denotes that the equilibrium is between two tetramers and one octamer etc. As in Table S3, ΔΔG is the insertion free energy difference obtained using the ‘steric’ energy model: this number can be obtained directly from knowledge of the number of attempted and successful insertions listed in this table. (0.12 MB RTF) Click here for additional data file. Video S1 Cytoplasm Full Energy Model. 1.8 microseconds of simulation carried out with the ‘full’ energy model. (9.97 MB MOV) Click here for additional data file. Video S2 Cytoplasm Steric Energy Model. 1.8 microseconds of simulation carried out with the ‘steric’ energy model. (9.96 MB MOV) Click here for additional data file.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Rationalization of the effects of mutations on peptide and protein aggregation rates.

            In order for any biological system to function effectively, it is essential to avoid the inherent tendency of proteins to aggregate and form potentially harmful deposits. In each of the various pathological conditions associated with protein deposition, such as Alzheimer's and Parkinson's diseases, a specific peptide or protein that is normally soluble is deposited as insoluble aggregates generally referred to as amyloid. It is clear that the aggregation process is generally initiated from partially or completely unfolded forms of the peptides and proteins associated with each disease. Here we show that the intrinsic effects of specific mutations on the rates of aggregation of unfolded polypeptide chains can be correlated to a remarkable extent with changes in simple physicochemical properties such as hydrophobicity, secondary structure propensity and charge. This approach allows the pathogenic effects of mutations associated with known familial forms of protein deposition diseases to be rationalized, and more generally enables prediction of the effects of mutations on the aggregation propensity of any polypeptide chain.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              How robust are protein folding simulations with respect to force field parameterization?

              Molecular dynamics simulations hold the promise of providing an atomic-level description of protein folding that cannot easily be obtained from experiments. Here, we examine the extent to which the molecular mechanics force field used in such simulations might influence the observed folding pathways. To that end, we performed equilibrium simulations of a fast-folding variant of the villin headpiece using four different force fields. In each simulation, we observed a large number of transitions between the unfolded and folded states, and in all four cases, both the rate of folding and the structure of the native state were in good agreement with experiments. We found, however, that the folding mechanism and the properties of the unfolded state depend substantially on the choice of force field. We thus conclude that although it is important to match a single, experimentally determined structure and folding rate, this does not ensure that a given simulation will provide a unique and correct description of the full free-energy surface and the mechanism of folding. Copyright © 2011 Biophysical Society. Published by Elsevier Inc. All rights reserved.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                May 2014
                22 May 2014
                : 10
                : 5
                Affiliations
                Max F. Perutz Laboratories, University of Vienna, Vienna, Austria
                Fox Chase Cancer Center, United States of America
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: DP BZ. Performed the experiments: DP. Analyzed the data: DP BZ. Wrote the paper: DP BZ.

                Article
                PCOMPBIOL-D-13-01838
                10.1371/journal.pcbi.1003638
                4031056
                24854339

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                Page count
                Pages: 11
                Funding
                Austrian Science Fund FWF (START grant Y 514-B11 to BZ, http://www.fwf.ac.at/) European Research Council (ERC Starting Independent grant 279408 to BZ, http://erc.europa.eu/), The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Biochemistry
                Proteins
                Protein Structure
                Biophysics
                Biophysical Simulations
                Computational Biology
                Molecular Biology
                Macromolecular Structure Analysis
                Macromolecular Complex Analysis

                Quantitative & Systems biology

                Comments

                Comment on this article