331
views
0
recommends
+1 Recommend
0 collections
    3
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          An updated genome-scale reconstruction of the metabolic network in Escherichia coli K-12 MG1655 is presented. This updated metabolic reconstruction includes: (1) an alignment with the latest genome annotation and the metabolic content of EcoCyc leading to the inclusion of the activities of 1260 ORFs, (2) characterization and quantification of the biomass components and maintenance requirements associated with growth of E. coli and (3) thermodynamic information for the included chemical reactions. The conversion of this metabolic network reconstruction into an in silico model is detailed. A new step in the metabolic reconstruction process, termed thermodynamic consistency analysis, is introduced, in which reactions were checked for consistency with thermodynamic reversibility estimates. Applications demonstrating the capabilities of the genome-scale metabolic model to predict high-throughput experimental growth and gene deletion phenotypic screens are presented. The increased scope and computational capability using this new reconstruction is expected to broaden the spectrum of both basic biology and applied systems biology studies of E. coli metabolism.

          Related collections

          Most cited references58

          • Record: found
          • Abstract: found
          • Article: not found

          The effects of alternate optimal solutions in constraint-based genome-scale metabolic models.

          Genome-scale constraint-based models of several organisms have now been constructed and are being used for model driven research. A key issue that may arise in the use of such models is the existence of alternate optimal solutions wherein the same maximal objective (e.g., growth rate) can be achieved through different flux distributions. Herein, we investigate the effects that alternate optimal solutions may have on the predicted range of flux values calculated using currently practiced linear (LP) and quadratic programming (QP) methods. An efficient LP-based strategy is described to calculate the range of flux variability that can be present in order to achieve optimal as well as suboptimal objective states. Sample results are provided for growth predictions of E. coli using glucose, acetate, and lactate as carbon substrates. These results demonstrate the extent of flux variability to be highly dependent on environmental conditions and network composition. In addition we examined the impact of alternate optima for growth under gene knockout conditions as calculated using QP-based methods. It was observed that calculations using QP-based methods can show significant variation in growth rate if the flux variability among alternate optima is high. The underlying biological significance and general source of such flux variability is further investigated through the identification of redundancies in the network (equivalent reaction sets) that lead to alternate solutions. Collectively, these results illustrate the variability inherent in metabolic flux distributions and the possible implications of this heterogeneity for constraint-based modeling approaches. These methods also provide an efficient and robust method to calculate the range of flux distributions that can be derived from quantitative fermentation data.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox.

            The manner in which microorganisms utilize their metabolic processes can be predicted using constraint-based analysis of genome-scale metabolic networks. Herein, we present the constraint-based reconstruction and analysis toolbox, a software package running in the Matlab environment, which allows for quantitative prediction of cellular behavior using a constraint-based approach. Specifically, this software allows predictive computations of both steady-state and dynamic optimal growth behavior, the effects of gene deletions, comprehensive robustness analyses, sampling the range of possible cellular metabolic states and the determination of network modules. Functions enabling these calculations are included in the toolbox, allowing a user to input a genome-scale metabolic model distributed in Systems Biology Markup Language format and perform these calculations with just a few lines of code. The results are predictions of cellular behavior that have been verified as accurate in a growing body of research. After software installation, calculation time is minimal, allowing the user to focus on the interpretation of the computational results.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR)

              Background Escherichia coli is perhaps the best characterized and studied bacterium and is of interest industrially, genetically and pathologically. For these reasons, in silico modeling efforts have been made to describe and predict its cellular behavior. With the vast amounts of '-omics' data that are being generated, there is a growing need for incorporating and reconciling heterogeneous datasets, including genomic, transcriptomic, proteomic and metabolomic data [1]. A constraint-based model of E. coli metabolism can accomplish this and serve as a model centric database. In addition to providing the context for various '-omics' data types, constraint-based models provide a framework to compute cellular functions [2]. This modeling method finds the limits of cellular, biochemical and systemic functions, thereby identifying all allowable solutions. Searches within the allowable solution space can identify solutions of interest, for example a solution that maximizes a particular objective. This approach to genome-scale model building has been reviewed in detail [3-6]. In general, the application of successive constraints (stoichiometric, thermodynamic and enzyme capacity constraints), with respect to the metabolic network, restricts the number of possible solutions. Linear optimization is often used to find a particular solution in the allowable solution space that maximizes a chosen objective function, such as cellular growth (Figure 1). A more detailed description of the constraint-based modeling approach can be found in Materials and methods. The constraint-based modeling approach has been used to study E. coli metabolism for over ten years; the history of such model building efforts has recently been reviewed [7]. The first genome-scale metabolic (GSM) model accounting for 660 gene products (iJE660 GSM) was reconstructed using genomic information, biochemical data and physiological data [8]. This genome-scale model has been used to perform in silico gene deletion studies [8] and to predict both optimal growth behavior [9] and the outcome of adaptive evolution [10]. This paper reports an expansion of iJE660a GSM, which itself is a slight modification of the original genome-scale metabolic model (iJE660 GSM) [8]. Gene to protein to reaction (GPR) associations are included directly in the new model (iJR904 GSM/GPR). These associations describe the dependence of reactions on proteins and proteins on genes (Figure 2). The metabolic network described by iJR904 has also changed; individual reactions are now elementally and charge balanced, and a significant number of new genes and novel reactions have been added to the model. iJR904 GSM/GPR accounts for over 904 genes and the 931 unique biochemical reactions the encoded proteins carry out. This paper discusses the effects that these additional reactions have on the predictive capabilities of the model and identifies putative ORFs in the genome which could resolve gaps in the metabolic network. Since computational models of E. coli will continue to grow in size and scope [7] it will become important to be able to distinguish between the different models - a naming convention will aid in this effort. The naming convention we chose to use mirrors the one already established for plasmids. The general form of the names of in silico strains used is iXXxxxa YYY. The 'i' in the name refers to an in silico model (that is, a computer model). This 'i' is followed by the initials (XX) of the person who developed the model and then the number of genes (xxx) included in the model. Any letters (a) after the number of genes indicates that slight modifications were made to the model, for instance iJE660a is derived from iJE660. Further designation of the content and scope of a model are found in YYY; here the acronyms GSM and GPR stand for genome-scale model and gene-protein-reaction associations, respectively. The contents of iJE660a and iJR904 can be found on our website [11], and iJR904 is also detailed in the additional data files. Results and discussion Properties of the iJR904 metabolic network An update on the annotation of the E. coli K-12 genome was published in 2001 [12], facilitating the process of updating the genome-scale in silico E. coli model (iJE660a GSM). Genes encoding known or putative enzymes and transporters not included in the iJE660a model were further examined. Literature and database searches (LIGAND [13], EcoCyc [14] and TC-DB [15]) on each of the genes provided the biochemical information needed to expand E. coli iJE660a. The iJR904 model was built using the software SimPheny™ (Genomatica, San Diego, CA) and accounts for 904 genes with a known locus in the genome, as compared to 660 genes in the previous model. The metabolic network described by E. coli iJE660a has expanded in size from 627 unique reactions and 438 metabolites to 931 unique reactions and 625 metabolites in iJR904. Complete maps containing all the reactions in the metabolic network are available in the additional data files and can also be downloaded from [11]. The molecular formulae and charges for the metabolites in the model were determined assuming a pH of 7.2. Fifty-eight of the reactions in iJR904 currently do not have associated genes. A complete list of the reactions can be found in the additional data files. Putative functional gene assignments account for 23 of the added reactions, with the majority of these being putative transporters. In addition to these new reactions, old reactions were updated to be both elementally and charge balanced by including water and protons as participants in the reactions. Six reactions in iJR904 are elementally balanced but not charge balanced (Table 1). Five out of the six reactions are imbalanced because they have not been fully characterized biochemically so assumptions had to be made about the participating metabolites; the remaining reaction (abbreviated ADOCBLS) is charge imbalanced since we were unable to determine the charge associated with the ion complex. The biomass reaction [8], representing the drain of biosynthetic constituents from the network and the growth-associated ATP requirement, was also changed to include internal protons and water. The amount of water needed in the biomass reaction is equal to the amount of ATP hydrolyzed to meet the growth-associated ATP requirement. The hydrolysis of ATP results in the production of a proton while the utilization of NADPH and NADH consumes a proton; this results in the net production of protons in the biomass reaction. Other updates to the iJE660a reaction network are also notable. A number of reactions in iJE660a could not previously be assigned to an ORF, but are now assigned to an ORF as a result of the updated genome annotation [12], such as mtn and uppS. Other ORF names have changed and these were also updated. Some of the original model reactions were modified in addition to including internal protons and water. These modifications mainly included changes in the stoichiometric coefficients, cofactor usage and reaction reversibility. In some cases, the metabolites that participate in the reaction were changed. Forty-two reactions were removed from iJE660a, and these are listed in the additional data files along with the reasons for their removal. iJR904 also accounts for the specificity of the quinones in the individual reactions involved in the electron transport chain (Figure 3). E. coli K-12 uses three quinones: ubiquinone (Q), menaquinone (MK) and demethylmenaquinone (DMK) to transfer electrons from the electron donor to the terminal acceptor; in iJE660a there was no distinction between the three quinones and they were all treated as ubiquinone, which led to inaccurate electron donor/electron acceptor pairs. GPR associations are for the first time directly included in the iJR904 model, and examples of some are shown in Figure 2. GPR associations have been constructed and their images can be found in the additional data files. These GPR associations can be used to evaluate the reactions remaining in the metabolic network after deletion of a specific gene. Including these associations directly into iJR904 will lead to a more accurate assessment of the effects of gene deletions. In addition, these GPR associations are necessary for analyzing diverse datasets by the model and using these datasets to further identify physiological states. Thus, iJR904 accounts more accurately for a number of the metabolic processes in E. coli K-12 MG1655, and expands in scope significantly through the addition of GPR associations. E. coli iJR904 GSM/GPR is no longer a purely metabolic model. Systemic properties A list of network components alone does not explain how the components work together to produce a biological function. These systemic properties can only be investigated when considering all the components simultaneously; it is the interaction of these components that provides the most information about cellular behavior. As with iJE660a, a myriad of other issues can be addressed with iJR904. We will address three types of issues in this paper using iJR904: gap analysis and putative ORF assignments, the importance of global proton balancing, and phase plane analysis [16]. Identification and resolution of dead ends A 'dead end' exists in a metabolic network if a metabolite is either only produced or only consumed in the network. If a metabolic network contains a gap, it is missing the biochemical reactions that can produce or consume the dead end metabolites. iJR904 has 70 dead end metabolites or gaps in the network; these are listed in the additional data files. These 70 metabolites participate in 89 reactions, indicating that at least 89 model reactions can never be used if the network is to operate at steady state. The reactions that lead up to the dead ends in iJR904 are included so that when the gaps are filled in at a future date, the network will be fully functional. Some of these network gaps could be reconciled by the addition of transporters, while others could be reconciled by modifying the growth function, that is, including dead end metabolites as requirements for biomass production. Neither of these steps was taken here since genomic or biochemical evidence could not be found to support their inclusion. Using these gaps, we attempted to identify new functional assignments based on sequence homology searches. A list of EC numbers corresponding to enzymes (not included in the model) that could resolve the gaps, was generated. This list was pooled together with the enzymes known to occur in E. coli which lack assigned loci (EcoCyc [14]). We subsequently collected amino acid sequences from other organisms (orthologous sequences) assigned to these enzymatic functions for a homology search study. A total of 83 training sets, each with an average of 11 orthologous sequences, was collected and compared against the E. coli genome. Each training set includes multiple orthologous sequences that correspond to an enzyme of interest. Of the 83 training sets, 61 are for enzymes which connect to gaps in the existing network and the remaining 22 are for some of the enzymes listed in EcoCyc. Each training set was processed using the alignment programs MEME [17] and ClustalW [18] to generate a profile for the corresponding enzyme. Using these profiles MAST [19] and HMMER [20] were used to identify similar ORFs in the E. coli genome. Of the 61 enzymes that could resolve network gaps, we assigned putative loci for 12 of them. In addition, putative loci could be found for 15 of the 22 enzymes listed in EcoCyc. Some of these enzymes have multiple matches within the E. coli genome, which together resulted in 55 putative assignments. These results were inspected manually and found to be relevant and consistent across the three search methods (MAST, with and without end-gap penalty, and HMMER). The results of this study largely coincided with the annotation performed by Serres et al. [12]; however, most annotation updates added more specificity to the type of reactions the enzymes were predicted to catalyze and, in some cases, suggested additional substrates that known enzymes might act upon (Table 2). Table 2 lists the top three matches for each enzyme (except for the case when the match is a known isozyme); a complete list of all results and expected values (e-values) can be found in the additional data files. The putative assignments presented in Table 2 should be used with care. For example, there are multiple putative assignments for the acyl-CoA dehydrogenase enzyme for which the gene has recently been found [21], but the actual gene locus for this enzyme (b0221) does not have the most significant e-value among the list of potential loci. Effects of constraining proton exchange flux The importance of balancing protons internally can now be investigated with iJR904 since the metabolic network accounts for all the protons being generated and consumed by the individual metabolic and transport reactions (only external protons associated with the proton motive force were accounted for in iJE660a). The medium can serve as a pool both supplying and dissipating external protons as needed by the cell. During growth on some carbon sources, the generation of internal protons by the metabolic reactions is relieved by secreting protons into the medium. This subsequently reduces the amount of ATP made by ATPase since these protons could be used to drive this reaction. Under other conditions, a shortage of internal protons is compensated for by taking up protons from the medium and transporting them across the cell membrane into the cytosol. The effects that the exchange of protons across the system boundary have on predicted growth rates were investigated. A robustness analysis [22], in which the flux through the proton exchange reaction was constrained from its optimal value down to zero, was performed under aerobic conditions for a variety of carbon sources (Figure 4). The predicted growth rates for different carbon sources respond differently as the exchange flux of protons (between the cell and the medium) is reduced to zero; glucose and glycerol were the most sensitive to proton exchange while D- and L-lactate were the least sensitive. When either glucose or glycerol was used as the carbon source, excess protons were generated intracellularly; this excess was relieved by secreting extracellular protons, thereby lowering the pH of the medium. For pyruvate, D- and L-lactate, acetate, α-ketoglutarate (αKG), succinate and malate there is a shortage of internal protons; as a result, cells would uptake protons from the medium thereby raising the pH. Since there can be no net accumulation of charge within the system, the total charge entering the system must equal the charge leaving the system. Pyruvate, lactate, acetate, α-ketoglutarate, succinate and malate, as used in these simulations, have a negative charge so H+ must be taken up; however, if the uncharged acidic form of these compounds was used instead (and the acid-base reaction was included in the network - generating proton(s) and the basic form of the compound) it would be predicted that H+ would be secreted by the cell thereby lowering the pH. It is generally thought that the pH of the medium becomes more acidic as E. coli grows. A lowering of pH has been observed for growth on unbuffered medium with glycerol; however, during growth on unbuffered medium with disodium succinate or sodium acetate the medium becomes more basic (J.L.R. and B.O.P., unpublished data). Clearly iJR904 with charge-balanced reactions highlights the challenge that cells have with globally balancing protons. As a part of the iterative model building process [2]iJR904 can be used to design informative experiments to systematically address this issue. Phenotypic phase plane (PhPP) comparisons The phenotypic phase planes [16] for growth on different carbon sources were calculated using both models iJR904 and iJE660a. A description of phenotypic phase planes can be found in Materials and methods. For these simulations the carbon uptake rate and oxygen uptake rate were varied. The carbon substrates tested included: glucose, pyruvate, acetate, glycerol, D-lactate, αKG, succinate and malate. The phase planes for pyruvate, D-lactate and succinate calculated using iJR904 were nearly identical to those calculated with model iJE660a (the lines demarcating the different phases only moved slightly). These results show that the modified and new reactions contained in iJR904 did not significantly affect the phase planes for these substrates, indicating that iJR904 makes similar predictions regarding optimal growth on these substrates. The line of optimality (LO) on a phenotypic phase plane corresponds to the conditions (oxygen and substrate uptake rates) which can maximize the biomass yield. The largest shifts in the line of optimality were observed for growth on pyruvate and αKG. However, these shifts are relatively small indicating that the optimal oxygen uptake and carbon source uptake rates needed to generate the maximal amount of biomass do not change significantly (less than 10%). The phenotypic phase planes for carbon sources other than pyruvate, D-lactate, and succinate have more significant changes when calculated with iJR904 as compared to iJE660a. These include growth on the following carbon sources: glycerol, glucose, acetate, malate and αKG. The resulting phase planes calculated using iJR904 and iJE660a are shown in red and blue, respectively, in Figure 5a-e. For the malate and glucose phase planes (Figure 5a,b) one of the lines only appears on the phase plane calculated using iJE660a; these changes are attributed to the effects of global proton balancing described in the previous section. This issue also accounts for changes in the acetate phase plane (Figure 5c) and is a contributing factor to the changes observed with the glycerol and αKG phase planes. However, other changes (discussed below) to the metabolic networks have more significant effects on these two phase planes. Glycerol phase plane The changes in the glycerol phenotypic phase plane (Figure 5d) were less drastic than those for the αKG phase plane (Figure 5e, see comments below). The only major change was that the lines in the microaerobic region shifted downward. This change is a result of the removal of two reactions involved in pyrridoxal recycling which were previously assigned to pdxH. The two reactions are not included in the metabolic network defined by iJR904 for reasons explained in the additional data files. α-Ketoglutarate phase plane The two-dimensional phenotypic phase plane for αKG is noticeably different when calculated for iJR904 and iJE660a (Figure 5e). One notable feature is that iJR904 predicts completely anaerobic growth on αKG, while iJE660a predicts that oxygen is required for growth on αKG. Under oxygen limitations, corresponding to regions below the line of optimality, the expanded model is also more efficient at producing biomass from αKG than the previous model. As oxygen becomes more limiting, iJR904 becomes increasingly more efficient at generating biomass than iJE660a. Examination of the calculated optimal flux distributions provides insights into why iJR904 is more efficient. One of the reasons for this increased growth efficiency is that iJR904 includes the citrate lyase enzyme, which converts citrate (CIT) to acetate (AC) and oxaloacetate (OAA). During oxygen-limited growth, iJR904 predicts that some of the αKG is converted to OAA by first reversing some of the TCA cycle reactions to generate CIT and then splitting CIT into AC and OAA by citrate lyase. The rest of the αKG is consumed through the forward reactions of the TCA cycle to produce malate (Figure 6). Removal of the citrate lyase reaction from the network under anaerobic conditions shows two other, less-efficient routes that would still enable iJR904 to predict anaerobic growth. These new metabolic routes, which are dependent on at least one of the new additions, are depicted in Figure 5f. Conclusions This paper reports the curation and expansion of a previous genome-scale constraint-based model of E. coli metabolism (iJE660a GSM) that is now used in multiple laboratories (A.L. Barabasi, personal communication; Church and colleagues [23]; H. Greenberg, personal communication and C. Maranas, personal communication). This expanded model, iJR904 GSM/GPR, includes 37% more metabolic genes and 47% more metabolic reactions. Each reaction in the network is now both elementally and charge balanced with the exception of the six reactions listed in Table 1. While the new reactions added to the network do not change many of the predicted optimal phenotypes, there are instances in which the expanded model makes significantly different predictions, examples of which occur when glycerol, glucose, malate, acetate and αKG are used as the carbon sources under oxygen-limited conditions. The analysis of dead ends or gaps in the network has led to putative annotations of 55 ORFs. If these putative functional assignments are verified biochemically they can be included in future updates of iJR904. Ideally an iterative process will be developed, within which the model can help identify new targets, and, if verified, can lead to an updated model. This iterative process [2] would be likely to produce more useful results in less-characterized organisms and has already been successful in helping to identify malate dehydrogenase in Helicobacter pylori [24] and citrate synthase in Geobacter sulfurreducens (D. Lovley, unpublished results). The incorporation of GPR associations into iJR904 will allow for the analysis of transcriptomic and proteomic data directly; it also enables the incorporation of these datasets to further constrain the solution space leading to more accurate predictions of phenotypic data. E. coli iJR904 can now serve as a model centric database which could analyze and reconcile heterogeneous datasets as well as use these datasets to aid in model predictions. Materials and methods Constraint-based modeling A stoichiometric matrix, S (m × n), is constructed where m is the number of metabolites and n the number of reactions. Each column of S specifies the stoichiometry of the metabolites in a given reaction from the metabolic network. Mass balance equations can be written for each metabolite by taking the dot product of a row in S, corresponding to a particular metabolite, and a vector, v , containing the values of the fluxes through all reactions in the network. A system of mass balance equations for all the metabolites can be represented as follows: where X is a concentration vector of length m, and v is a flux vector of length n. At steady-state, the time derivatives of metabolite concentrations are zero, and equation (1) can be simplified to: S • v = 0 It follows that in order for a flux vector v to satisfy this relationship, the rate of production must equal the rate of consumption for each metabolite. Application of additional constraints further reduces the number of allowable flux distributions, v . Limits on the range of individual flux values can further reduce the number of allowable solutions. These constraints have the form: α ≤ v i≤ β where α and β are the lower and upper limits, respectively. Maximum flux values (β) can be estimated based on enzymatic capacity limitations or, for the case of exchange reactions, measured maximal uptake rates can be used. Thermodynamic constraints, regarding the reversibility or irreversibility of a reaction, can be applied by setting the α for the corresponding flux to zero if the reaction is irreversible. These constraints are not sufficient to shrink the original solution space to a single solution. Instead a number of solutions remain which make up the allowable solution space. Linear optimization can be used to find the solution that maximizes a particular objective function. Some examples of objective functions include the production of ATP, NADH, NADPH or a particular metabolite. An objective function with a combination of the metabolic precursors, energy and redox potential required for the production of biomass has proven useful in predicting in vivo cellular behavior [9,10,25,26]. Simulation conditions Simulations with iJR904 were all done using the software package SimPheny™ (Genomatica, San Diego, CA); this software was also used to build iJR904. All calculations were made using the conditions outlined in this section. The biomass reaction was the same as that reported previously [8] with the addition of intracellular protons and water, and can be found in the additional data files. All flux values reported in this section are in units of mmol/g DW-hr. The flux through the non-growth associated ATP maintenance reaction (ATPM in the additional data files) was fixed to 7.6. Fluxes through all other internal reactions have an upper limit of 1 × 1030; if the reaction is reversible the lower limit is -1 × 1030 and if it is irreversible the lower limit is zero. In addition to the metabolic reactions listed in the additional data files, reversible exchange reactions for all external metabolites were also included in the simulations to allow external metabolites to cross the system boundaries. If these exchange reactions are used in the forward direction the external metabolites leave the system and if used in the reverse direction (that is, a negative flux value through the reaction) the external metabolites enter the system. The following external metabolites were allowed to freely enter and leave the system: ammonia, water, phosphate, sulfate, potassium, sodium, iron (II), carbon dioxide and protons (except during the robustness study where proton exchange flux was constrained down to zero). The corresponding exchange fluxes for these metabolites have a lower and upper flux limit of -1 × 1030 and 1 × 1030, respectively. Aerobic conditions were simulated with a maximum oxygen uptake rate of 20 mmol/g DW-hr, by setting the lower and upper limits for the oxygen exchange flux to -20 and 0 respectively, and anaerobic conditions were simulated by fixing the oxygen uptake rate to 0. All other external metabolites, except for the carbon source, were only allowed to leave the system. The lower and upper limits on their corresponding exchange fluxes were 0 and 1 × 1030, respectively. Growth on different carbon sources was simulated by allowing those external metabolites to enter the system; the actual flux values for uptake rates used in the simulations are noted in the text and figures, where the upper limit is 0 and the lower limit is the negative of the uptake rate listed. These constraints are also summarized in the additional data files. Phenotypic phase planes (PhPP) Phenotypic phase plane analysis was developed to generate a global view of the optimality properties of a network [16]. The phenotypic phase plane is constructed from a large number of individual optimal solutions and gives an overall view of the optimality properties of the network. PhPPs are used to show all possible quantitative flux distributions through a network while varying two or three constrained fluxes. The different regions of the phase plane have qualitatively different flux distributions that translate to different metabolic phenotypes. One important feature of the PhPP is the line of optimality (LO). Points that lie on the LO optimize the objective function for a given substrate uptake rate. When the cell is operating along the LO, calculated when the growth rate is used as the objective function, the cell is growing with a maximal biomass yield. Identification of dead ends Dead end metabolites are classified as such if a metabolite can either be produced but not consumed or consumed but not produced. By examining the connectivity of the metabolites in the S matrix, a list of dead ends can be generated. Once these are identified, the number of reactions directly involved with these metabolites can be determined by enumerating the number of non-zero elements in the row corresponding to each dead end metabolite. Sequence annotations A list of reactions and associated enzymes that could connect the dead ends with the rest of the network was gathered from LIGAND (Database of Chemical Compounds and Reactions in Biological Pathways [13]). These are enzymes known to act on those metabolites in other organisms. The enzymes listed in EcoCyc which are known to be in E. coli but lack assigned loci, were also added to the search list. The enzyme commission numbers for this combined list of enzymes were used in queries against the SwissProt, TremBl and TremBlnew databanks (using Sequence Retrieval System [27]) to retrieve the enzymes' corresponding amino acid sequences. Known orthologous sequences of all of the enzymes of interest were grouped together to construct multiple sequence alignments. Two separate programs, MEME (Multiple Expectation maximization for Motifs Elicitation [17]) and ClustalW [18] were used for this purpose. MEME was run assuming that each sequence may contain a variable number of non-overlapping occurrences of each motif and up to three distinct motifs. Each training set was processed twice with MEME, once with the program's default end-gap penalty and once without it. Default values were used for gap-opening penalty and gap-extension penalty. All sequences in each training set were weighed equally by MEME. ClustalW, however, down-weighed similar sequences in proportion to their degree of relatedness. Pair-wise alignments in ClustalW were run with a dynamic programming algorithm (slow option) and with the Reset Gap option off. The MEME output files were submitted to MAST [19] (Motif Alignment and Search Tool, version 3.0 online) and searched for matching sequences against the E. coli genome. MAST returned a list of high-scoring sequences and their annotations. ClustalW output files were used by HMMER's hmmbuild and hmmcalibrate (version 2.2g [20]) with default parameters to train profile HMMs (Hidden Markov Models). hmmsearch was subsequently applied to find sequences in the E. coli genome that matched each profile. Results from MAST's and HMMER's searches were manually examined, and relevant matches were reported. Additional data files The additional data consists of three files. The Excel file (Additional data file 1) contains the following: a list of the reactions in iJR904; definitions of the metabolite abbreviations; a list of the exchange fluxes used in simulations and their constraints; a list of the dead end metabolites in the metabolic network; a list of the reactions that were not included from iJE660a and a complete list of the sequence comparison results (including e-values). A zip file (Additional data file 2) is also provided, including the JPEG images of all the GPR associations and a detailed document describing how to interpret these images. The GPR image files are each labeled to correspond to either an individual gene or reaction. The third file (Additional data file 3) contains six maps of metabolism which together include all the reactions in iJR904; the reaction and metabolite abbreviations are the same as those listed in the Excel file. Supplementary Material Additional data file 1 A list of the reactions in iJR904; definitions of the metabolite abbreviations; a list of the exchange fluxes used in simulations and their constraints; a list of the dead end metabolites in the metabolic network; a list of the reactions that were not included from iJE660a and a complete list of the sequence comparison results (including e-values) Click here for additional data file Additional data file 2 The JPEG images of all the GPR associations and a detailed document describing how to interpret these images Click here for additional data file Additional data file 3 Six maps of metabolism which together include all the reactions in iJR904 Click here for additional data file
                Bookmark

                Author and article information

                Journal
                Mol Syst Biol
                Molecular Systems Biology
                Nature Publishing Group
                1744-4292
                2007
                26 June 2007
                : 3
                : 121
                Affiliations
                [1 ]Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
                [2 ]Department of Chemical and Biological Engineering, McCormick School of Engineering and Applied Sciences, Northwestern University, Evanston, IL, USA
                [3 ]Bioinformatics Research Group, SRI International, Ravenswood, CA, USA
                [4 ]Laboratory of Computational Systems Biotechnology, Ecole polytechnique fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
                Author notes
                [a ]Department of Bioengineering, University of California San Diego, 9500 Gilman Drive, Mail Code 0412, La Jolla, CA 92093, USA. Tel.: +1 858 534 5668; Fax: +1 858 822 3120; bpalsson@ 123456bioeng.ucsd.edu
                Article
                msb4100155
                10.1038/msb4100155
                1911197
                17593909
                ce0df4cc-047e-4397-8a8f-725fcb8137cb
                Copyright © 2007, EMBO and Nature Publishing Group

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation or the creation of derivative works without specific permission.

                History
                : 20 December 2006
                : 12 April 2007
                Categories
                Article

                Quantitative & Systems biology
                computational biology,thermodynamics,systems biology,group contribution method

                Comments

                Comment on this article