3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Microbial Musings – February 2020

      editorial
      1 , *
      Microbiology
      Microbiology Society

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Microbiology has been at the forefront of many people’s minds during February as the new coronavirus outbreak is tracked on a global scale with major efforts to control its spread. Daily updates on infection numbers, transmission rates, genome sequences and efforts to develop rapid diagnostics and a vaccine are keeping our field in the news. However, despite the unprecedented gathering and sharing of knowledge, stopping the spread of the virus still seems almost impossible in our global 21st-century world. You can keep up to date with news from the Microbiology Society (@MicrobioSoc) or follow legendary US microbiologist Richard Lenski’s (@RELenski) blog (https://telliamedrevisited.wordpress.com/). For this month’s musings we start with a couple of articles about bacterial metabolism. The first is an interesting and novel review of metabolic networks found in the human gut microbiome [1]. It is novel in that it was written as a collaborative effort by a consortium of undergraduate students studying microbiology at Concordia University in Montréal, Canada, led by Chiara Gamberi (@ChiaraGamberi), following a model that she used in 2019 [2]. Using such a large cohort, with students split into small teams working on particular subtopics, the review is both extensive and very broad ranging in its coverage of the metabolic interactions of the gut microbes with the host and their functions in health and disease. Professor Gamberi spoke to the Society about this paper and its pedagogical underpinnings, which you can read about on the Microbiology Society website. The second metabolic network-related paper is a primary article from the lab of Bernard Palsson at UCSD (@ucsd_sbrg). His group has led the field in creating and using whole-genome metabolic analysis for fundamental biological study and biotechnological applications [3]. In our lab, where Escherichia coli K-12 is a workhorse organism of study, the names of members of his group are legendary, as the models are named after their creators (author's initials and the number of genes in the mode). Hence, the models of Jennifer Reed (iJR904) (@UWMadCBE), Adam Feist (iAF1260) (@DTUBiosustain) and later Jeffrey Orths (iJO1366) [4–6] have been used as starting points for many of our own studies [7, 8]. More recently Feist and Palsson have been using experimental evolution or adaptive laboratory evolution (ALE) as an approach to push the boundaries of E. coli metabolism and generally showing its remarkable robustness and flexibility [9]. In this work they use ALE to understand more about the response of E. coli K-12 MG1655 to acid stress induced by growth at pH 5.5 in defined glucose minimal medium [10]. The ability of E. coli strains to be able to resist quite strong acid for short periods of time likely reflects the necessity for this capability to enable passage through the stomach (pH 2) and later survival in the more weakly acidic intestine of its host, and here a weaker acid stress (pH 5.5) was applied for 800 generations. Of the six evolved lines, five had mutations in rpoC, encoding the beta-subunit of RNA polymerase, which they conclude, after measuring changes in gene expression in the evolved lines by RNAseq, are due to generally increased fitness leading to faster growth, rather than any particular improvement in acid resistance. A related study from the lab of acid-stress aficionado Joan Slonczewski found a similar mutation in RNA polymerase subunits in a study at external pH 4.8 [11], but also additional mutations removing acid resistance systems such as the lysine decarboxylase (CadA) needed for extreme acid resistance, suggesting that adaptation to milder acid stress over long periods leads to a tempering of the extreme acute response. This reinforces the findings from ALE that you really get what you select for, obvious as this sounds, and so small differences in experimental design can lead to very different outputs. Interested readers in the European Union (EU) can engage with the new COST Action on EuroMicropH – ‘Understanding and exploiting the impacts of low pH on micro-organisms’ (@EuroMicropH) led by action Chair Pete Lund (@lundpa) from Birmingham, UK. Next we have a couple of papers continuing last month’s theme on the genus Pseudomonas . This month we start with a species not covered in January, Pseudomonas putida , an organism with great potential for use in biotechnology due to its robust constitution and interesting biochemistry [12]. Within the EU this bug is being developed through large projects such as EmPowerPutida (@EmPowerPutida), with strong advocates like Victor de Lorenzo in Madrid, Spain (@vdlorenzo_CNB). The work is this issue is from Cecilia Arraiano’s group (@ArraianoLab) in Lisbon, Portugal (@itqbunl) and Ralf Takors in Stuttgart, Germany, who investigated patterns of gene expression when growing P. putida KT2440 in a chemostat with a modification called a plug flow reactor [13]. Here, after reaching steady state in the chemostat, portions of the culture are flowed around a loop system where the lack of new media addition causes rapid depletion of glucose, leading to transient changes in gene expression before the cells flow back into the chemostat. In this plug flow design samples are taken at different distances around the loop and RNA profiles are measured by RNAseq, with the glucose-replete chemostat functioning as the comparison. Using their experimental data as the starting point, they combined this with in silico prediction tools for small non-coding RNAs (ncRNAs) and discovered a set of 725 ncRNAs. This new set of detectable ncRNAs is not only very large, but remarkably only a small number match known ncRNAs, and a related study from Katherine Long’s group at the Technical University of Denmark (DTU) near Copenhagen, Denmark, examining different stresses found a quite different set of transcripts [14], suggesting that there is much to learn about the diversity and functioning of these short transcripts in cell physiology. Our second Pseudomonas paper continues a theme from last month on the role of soluble secreted metabolites acting as signalling molecules [15]. In the new study, from Jerry Reen and Fergal O’Gara’s labs in Cork, Ireland, they examine how secreted alkyl-quinolones of related structures have significantly different effects on other bacterial pathogens [16]. Using members of the Vibrionaceae , they show activity against Vibrio cholerae and Vibrio vulnificus , both human pathogens, but that Vibrio parahaemolyticus is totally resistant. The authors remind us that the Pseudomonas – Vibrio interaction has attracted a lot of attention through the discovery of ‘tit-for-tat’ killing via type VI secretions systems (T6s) [17], but here secreted small molecules are also clearly important for this inter-species competition. The group of Jake McKinlay (@JakeMcKinlay and @mckinlab) at Indiana University, Bloomington, USA have published an interesting paper, also highlighted by Senior Editor Professor Gail Preston as Editor’s Choice for the February issue, demonstrating robustness in how bacterial cells manage their redox state [18]. The photosynthetic bacterium Rhodospirillum rubrum , when growing in the light with an organic source of carbon, which we know as photoheterotrophic growth, generates an excess of reducing potential that it must dissipate. Classically, this bug uses the Calvin cycle to reoxidize its reduced electron carriers, but in this paper the group remove this system to see how the bugs cope. By using labelled carbon (13C), they show that the cells run part of their tricarboxylic acid (TCA) cycle backwards to provide precursors for alternative pathways to synthesize some amino acids. When the authors added these amino acids to the media (so the bugs did not need to make them), the Calvin cycle mutant couldn’t grow. Hence, they concluded that the physiology of this microbe had built-in metabolic robustness to deal with varying redox stresses through multiple routes. For the last paper in this month’s musings we move to a very cool eukaryotic microbe, the soil amoeba Dictyostelium discoideum. As a bacteriologist it is easy to forget how incredibly interesting the cell biology of these microbes is. While some bacteria, such as Myxococcus , can swarm together, differentiate and form fruiting bodies, this pales into insignificance compared to the accomplishments of ‘Dicty’, as it is known to its researchers. Initiation of this process involves the secretion of cyclic AMP (cAMP), which is used as a signal for chemotaxis of cells to form the initial multicellular aggregate. An important regulator of cAMP levels is a phosphodiesterase called RegA and in this study from the group of Jeff Hadwiger at Oklahoma State University, USA, including Nick Kuburich (@Kubu59), the authors examine how the MAP kinase Erk2 interacts with RegA, as it is known that the phosphorylation of a site on RegA regulates its function negatively [19]. Using genetic and biochemical experiments, they find evidence for a clear interaction between the proteins. The study supports the idea that MAP kinase regulation of cAMP signalling is conserved across diverse eukaryotes. The Microbiology Society will be running a symposium on protists, co-sponsored by Protistology UK, called ‘Exploring the eukaryotic tree of life’ at Annual Conference this year, so come and learn more about these fascinating microbes. Early bird discount ends on 3 March, so sign up now if have not not already for this great event. Gavin Thomas, Deputy Editor-in-Chief.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information

          An updated genome-scale reconstruction of the metabolic network in Escherichia coli K-12 MG1655 is presented. This updated metabolic reconstruction includes: (1) an alignment with the latest genome annotation and the metabolic content of EcoCyc leading to the inclusion of the activities of 1260 ORFs, (2) characterization and quantification of the biomass components and maintenance requirements associated with growth of E. coli and (3) thermodynamic information for the included chemical reactions. The conversion of this metabolic network reconstruction into an in silico model is detailed. A new step in the metabolic reconstruction process, termed thermodynamic consistency analysis, is introduced, in which reactions were checked for consistency with thermodynamic reversibility estimates. Applications demonstrating the capabilities of the genome-scale metabolic model to predict high-throughput experimental growth and gene deletion phenotypic screens are presented. The increased scope and computational capability using this new reconstruction is expected to broaden the spectrum of both basic biology and applied systems biology studies of E. coli metabolism.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A comprehensive genome-scale reconstruction of Escherichia coli metabolism—2011

            Introduction A common denominator for systems biology studies of a target organism is a high-quality genome-scale metabolic network reconstruction. A network reconstruction represents a biochemically, genetically, and genomically structured knowledgebase that contains detailed information about an organism in a structured format (Thiele and Palsson, 2010). Metabolic network reconstructions contain information such as exact stoichiometry of metabolic reactions, chemical formulas and charges of metabolites, and the associations between genes, proteins, and reactions. These reconstructions form a basis for the formulation of mechanistic, and thus computable, genome-scale genotype–phenotype relationships (Palsson, 2009). The most detailed and complete metabolic reconstruction of any organism to date is for the common laboratory strain Escherichia coli K-12 MG1655. The first genome-scale reconstruction of E. coli was iJE660 (Edwards and Palsson, 2000). This network was constructed through extensive searches of literature and databases to ensure correct stoichiometry and cofactor usage, and was the most extensive metabolic network reconstruction in existence at that time. An updated version of this reconstruction, iJR904 (Reed et al, 2003), had an expanded scope, including pathways for the consumption of alternate carbon sources and more specific quinone usage in the electron transport system. Hundreds of new genes and reactions were added, gene–protein–reaction associations (GPRs) were included for the first time to connect reactions with genes, and all reactions were elementally and charged balanced through the inclusion of protons. In the next update, iAF1260 (Feist et al, 2007), the scope of the network was expanded again, now including many reactions for the synthesis of cell wall components, and all metabolites were assigned to the cytoplasm, periplasm, or extracellular space. The thermodynamic properties of each reaction were calculated, and this was used to set lower bounds on predicted irreversible reactions. iAF1260 contained 2077 reactions, 1039 metabolites, and 1260 genes. A core model version of iAF1260, useful for testing and debugging new constraint-based algorithms and for educational use, has also been published (Orth et al, 2010a). Here, we present an updated version of the E. coli metabolic network reconstruction. This new version, titled iJO1366, includes many newly characterized genes and reactions. Since the iAF1260 model was a very complete representation of the known metabolism of E. coli, only minor expansions in the scope of the network were made. Still, new discoveries since 2007 have made this model update necessary. Several genes were added based on the results of an experimental screen of E. coli knockout strains in four different media conditions. The gaps in the iAF1260 network were identified and characterized, and new reactions and genes were added to reduce the total number of gaps. The iJO1366 reconstruction can serve as a basis for metabolic network reconstructions of other E. coli strains and closely related organisms. We assembled preliminary reconstructions for many other E. coli strains, and analysis of their completeness provides insights into the metabolic networks of these organisms. iJO1366 is the most complete E. coli metabolic reconstruction to date, and like its predecessors, it will likely aid in many new discoveries (Feist and Palsson, 2008). Results and discussion Process for updating the reconstruction and its content The updated network reconstruction of E. coli K-12 MG1655 began with the iAF1260b network (Feist et al, 2010), a slightly updated version of the iAF1260 network. In order to identify incorrect model predictions in order to improve the E. coli reconstruction, we experimentally determined conditional essentiality for most of the genes in the iAF1260b model. By comparing model predicted growth phenotypes to the measurements, errors in the reconstruction were found and several updates were made (Supplementary Table 1). For a discussion of the updates made to iAF1260b based on this screen, see Experimental phenotypic screens in the Supplementary Information. Next, literature and database searches were used to add newly characterized genes and reactions since 2007. The EcoCyc (Keseler et al, 2009) and KEGG (Kanehisa et al, 2010) databases were used extensively for this purpose. Results from the experimental screen described above also led to several model updates. After this first round of updates, the reconstruction contained 1274 genes. The network gaps (Orth and Palsson, 2010) in this version of the reconstruction were then investigated using a modified version of the GapFind algorithm (Satish Kumar et al, 2007). All orphan reactions (reactions without known associated genes) in the reconstruction were also identified from the model GPRs. Gaps were manually sorted into scope and knowledge gaps. Scope gaps are metabolites that are blocked in a model due to the limited scope of the network reconstruction, but have actual known producing and consuming reactions. Knowledge gaps exist because our knowledge of any metabolic network is incomplete. Targeted literature and database searches were performed for each knowledge gap to try to identify any known metabolic reactions missing from the reconstruction. We continued to add newly published metabolic information to the reconstruction during this gap-filling process, and the reconstruction was ultimately updated to iJO1366. All manual curation followed an established protocol (Thiele and Palsson, 2010). iJO1366 represents a significant expansion of the E. coli reconstruction, as it contains 1366 genes, 2251 metabolic reactions, and 1136 unique metabolites. A comparison of the content of iJO1366 and its predecessor, iAF1260, is presented in Table I. Like iAF1260, iJO1366 contains a wide range of metabolic functions (Figure 1). The complete lists of reactions and metabolites in iJO1366 can be found in Supplementary Tables 2 and 3, with a list of all references used in Supplementary Table 4. iJO1366 accounts for three cellular compartments: the cytoplasm, periplasm, and extracellular space. In total, 107 new genes were added to the reconstruction, while one gene, prpE (b0335), was removed. Most new genes added have been characterized since iAF1260 was published in 2007 (Figure 1D). The fact that some references predate previous versions of the E. coli reconstruction does not necessarily mean that they were previously missed. Rather, as genes and reactions are often added on a pathway basis, complete functional pathways are typically fully elucidated over time from multiple sources. The new genes mostly add new pathways and systems to the network, but a significant number of them fill gaps and orphan reactions in existing systems (Figure 1E). A complete list of all new and removed genes, reactions, and metabolites can be found in Supplementary Table 5. The ‘core' and ‘wild-type' biomass reactions of iAF1260 have also been updated in iJO1366. These are reactions that drain biomass precursor compounds in experimentally determined ratios to simulate growth (Feist and Palsson, 2010). For the complete core and wild-type biomass reactions see Updating the biomass composition and growth requirements in the Supplementary Information and Supplementary Table 6. The knowledge index (number of abstracts in Medline) of the 1366 genes in the network was computed, and indicates that iJO1366 contains most of the best-characterized genes in E. coli (Knowledge index of iJO1366 genes in the Supplementary Information and Supplementary Table 7). iJO1366 was also compared with an automatically generated E. coli reconstruction from the Model SEED (Henry et al, 2010) (Comparison of iJO1366 to the Model SEED E. coli reconstruction in the Supplementary Information and Supplementary Table 8) and to the protein localization database EchoLocation (Horler et al, 2009) (Comparison of iJO1366 to the EchoLocation database in the Supplementary Information and Supplementary Table 9). A significant number of the gaps in the iAF1260 network were filled during the update to iJO1366, and several blocked pathways were unblocked. Several different types of gaps in metabolic networks are possible. Root no-production gaps are metabolites with consuming reactions but no producing reactions. Root no-consumption gaps are metabolites with producing reactions but no consuming reactions. Downstream gaps are metabolites with producing and consuming reactions but which are unable to be produced at steady state because they are downstream of a root no-production gap. Similarly, upstream gaps are upstream of root no-consumption gaps. The final reconstruction contains 48 root no-production gaps, 63 root no-consumption gaps, 52 downstream gaps, and 69 upstream gaps. In total, 11.5% of the metabolites in iJO1366 are blocked under all conditions due to gaps (Gaps and orphan reactions in the iJO1366 reconstruction in the Supplementary Information and Supplementary Table 10). The orphan reactions in models such as iJO1366 can also help to identify the functions of metabolic genes. In the original iJR904 study (Reed et al, 2003), gene homology was used to predict the likely E. coli genes that encode the enzymes for 56 orphan reactions. Since then, 14 of these predictions have been independently confirmed to be correct (Supplementary Table 11), and these genes are now included in iJO1366. Prediction of metabolic phenotypes Flux balance analysis (FBA) (Orth et al, 2010b) can be used with a constraint-based model to predict metabolic flux distributions, growth rates, substrate uptake rates, and product secretion rates. The iAF1260 model and its predecessors were already very accurate at making phenotypic predictions such as growth rates and central metabolic flux distributions, so improved predictive capabilities in these areas were not expected with iJO1366. Instead, the value of the updated model is in its ability to predict phenotypes under a wider range of conditions than its predecessors. To demonstrate the utility of the iJO1366 model in making these phenotypic predictions, we generated two large-scale sets of model phenotype predictions. First, the growth phenotypes of E. coli on all possible carbon, nitrogen, phosphorus, and sulfur sources were predicted. The numbers of growth-supporting substrates are summarized in Table II, and the full results of this screen given in Supplementary Table 12 and discussed in Prediction of all growth-supporting carbon, nitrogen, phosphorus, and sulfur sources in the Supplementary Information. We also performed a screen of model predicted growth phenotypes for all possible single gene knockout strains. Growth phenotypes were predicted on both glucose and glycerol minimal media, and the results were compared with experimental data sets (Table III; Supplementary Table 13). Not unexpectedly, iJO1366 is slightly less accurate at predicting overall gene essentiality than iAF1260. This difference is due to the fact that the 107 new genes added to this model version are from less well-studied systems and pathways than the existing genes in iAF1260, as discussed more thoroughly in Prediction of gene essentiality in the Supplementary Information. Mapping iJO1366 to closely related strains Although iJO1366 is a model of E. coli K-12 MG1655, gene homology mapping can be used to create models of other E. coli and Shigella strains. To date, the metabolic reconstruction of E. coli W (Archer et al, 2011) is the only published reconstruction for E. coli strains other than K-12. The iJO1366 reconstruction should prove useful for the study of other recently sequenced E. coli strains. A previous analysis of multiple E. coli genomes showed that there is a moderate level of variability with respect to metabolic gene content within the species (Vieira et al, 2011). This analysis went one step beyond genetic conservation, also investigating the conservation of network topology, but it stopped short of computing the capacity to carry flux through specific growth-supporting pathways. While it is known that equivalent function is not guaranteed by gene homology, it is still one of the most commonly used and effective methods of genome annotation (Frazer et al, 2003). In addition, the combination of sequence homology with constraint-based analysis of metabolic networks can be used to determine the most likely metabolic gene content of an organism by generating models that match known biology. Using this approach, we predicted with FBA whether metabolic models based on iJO1366 for 38 E. coli and Shigella strains are capable of producing biomass on glucose minimal media at various conservation thresholds (Figure 2A). At each percent identity (PID) cutoff, the set of genes that are absent from iJO1366 was determined using Smith–Waterman alignments. The set of missing genes was then used to impose constraints on metabolic reactions utilizing the iJO1366 GPRs. As expected, the results show that all strains were capable of producing biomass with no conservation requirement, but many strains lost this ability as the requirement was made more stringent. At a PID of 40%, only four strains were incapable of producing biomass. This PID was used for further analysis, and was justified through network-based analysis of auxotrophies (Figure 2B and Mapping iJO1366 to closely related strains in the Supplementary Information). Hundreds of unannotated genes in various E. coli and Shigella strains were identified through comparisons to iJO1366 genes, and are listed in Supplementary Table 14. By analyzing the genetic content of 38 E. coli and Shigella genomes, 1006 metabolic genes in iJO1366 were found to be common to all genomes. The average genome in this analysis contains ∼97% of the genes in iJO1366 (Figure 2C). The mapping procedure described here led to the creation of 38 strain-specific models. It is important to note that these models are not yet on the same level as typical ‘draft' metabolic models, as new metabolic functions (functions beyond those occurring in E. coli K-12 MG1655) have not yet been considered. Because only the 1366 metabolic genes of iJO1366 were considered for this mapping, the common set of genes (74%) appears larger than in the previous study by Vieira et al, which considered a set of 1545 reactions. This difference is also partly due to the fact that Vieira et al added orphan reactions to their E. coli networks and compared metabolic content on a reaction basis, rather than on a gene basis. Conclusions Although the metabolism of E. coli has been the subject of active research for decades, the new content added to the iJO1366 reconstruction is a result of the new discoveries that continue to be made. In fact, most of the newly characterized genes in iJO1366 were actually characterized in 2010 (Figure 1D). Although this is a small sample size, it appears that the pace of new discoveries is not slowing as more of the metabolic network is uncovered. There are still numerous uncharacterized E. coli K-12 MG1655 genes, many of which are predicted to have metabolic functions (Riley et al, 2006). As more discoveries are made in the future, additional updates to the E. coli network reconstruction will need to be made. Future increases in the scope of the E. coli metabolic network reconstruction will likely include the integration of this network with reconstructions of other cellular systems such as transcription and translation (Thiele et al, 2009) or transcriptional regulation (Covert et al, 2004). iJO1366 is the most advanced and comprehensive metabolic reconstruction of any microorganism to date, and can thus continue to serve as a basis for the metabolic reconstruction of other bacteria. Based on the success of its predecessors, we expect that iJO1366 will be an important tool in microbial systems biology for years to come. Materials and methods Experimental phenotypic screens The iAF1260 E. coli K-12 MG1655 metabolic reconstruction was used to make computational predictions. The parent strain of the Keio Collection, BW25113, is derived from K-12 MG1655 and is missing several genes that are present in K-12 MG1655: araBAD, rhaBAD, and lacZ. Therefore, flux through the associated reactions without isozymes (ARAI, RBK_L1, RMPA, LYXI, RMI, RMK, and LACZ) was constrained by setting the upper and lower flux bounds of these reactions to zero. For aerobic growth, oxygen uptake was allowed by setting the lower bound of the oxygen exchange reaction to −18.5 mmol gDW−1 h−1. Anaerobic growth was modeled by setting the lower bound of this reaction to zero. For growth on glucose, the lower bound of the glucose exchange reaction was set to −8 mmol gDW−1 h−1. For growth on L-lactate and succinate, the lower bounds were set to −16 mmol gDW−1 h−1. These are arbitrary bounds based on typical substrate uptake rates. After setting the bounds for each condition, the predicted effect of the deletion of each gene in iAF1260 for each condition was computed using the singleGeneDeletion COBRA Toolbox function (Becker et al, 2007), which uses GPRs to constrain the appropriate reactions to carry zero flux and then predicts maximum growth using FBA. A gene was considered essential for the simulated condition if deletion of the gene reduced optimal growth rate to <5% of the wild-type strain as computed by FBA. When a strict threshold of zero growth was used to determine essentiality instead, only one additional essential gene was predicted: lldD (b3605) on lactate minimal medium. Knockout strains were taken from the Keio Collection (Baba et al, 2006; Yamamoto et al, 2009) (supplied by Open Biosystems), a genome-scale collection of E. coli K-12 gene knockouts. Stocks of knockout mutants were streaked onto LB agar with kanamycin (50 μg ml–1) from odd-numbered Keio Collection plates only. Two single colonies of the Keio knockout strain were inoculated into 96-well plates containing 200 μl of LB media with kanamycin (50 μg ml–1) and incubated overnight at 37 °C without shaking. Plates were then centrifuged and pelleted cells were washed twice with 200 μl of 1 × M9 salts per well. Disposable replicator pins were used to transfer cells from the preculture plate to four new plates, two containing glucose M9 minimal medium, one containing lactate M9 minimal medium, and one containing succinate M9 minimal medium. There was 200 μl of minimal medium per well, and the minimal media also contained 50 μg ml–1 kanamycin. The 96-well plates were covered with Aeraseal breathable film (Sigma) to minimize cross-well contamination. For aerobic conditions, the plates were incubated at 37 °C without shaking in a sterile cabinet. For anaerobic conditions, the plates were incubated at 37 °C in an anaerobic chamber ([O2] <50 p.p.m.). After 48 h, absorbance at 600 nm of each well was determined using a VERSAmax microplate reader (Molecular Devices, Sunnyvale, CA). A well was considered to have no growth or slow growth if its absorbance was less than a cutoff value specific to each of the four conditions. The cutoff value was determined by visual inspection of a histogram of absorbance values for all wells measured from a given condition. ‘Normal' growers are supposed to result in measurements lying in the roughly Gaussian-shaped distribution that makes up most of the data, while slow- and non-growers are supposed to result in measurements falling outside the upper 0.95 area of the Gaussian distribution. The cutoffs were OD600=0.26 for glucose aerobic plates, OD600=0.21 for glucose anaerobic plates, OD600=0.21 for lactate aerobic plates, and OD600=0.20 for succinate aerobic plates. If both colonies of a gene knockout were determined to have normal growth, then the gene was considered a true positive (TP) if the model had predicted growth for the knockout, or a tentative false negative (TFN) if the model indicated the knockout should not have grown or previous experiments in glucose MOPS minimal medium had indicated the gene was essential (Baba et al, 2006). Similarly, if both colonies of a gene knockout were determined to have slow/no growth, then the gene was considered a true negative (TN) if the model predicted the same outcome, or a tentative false positive (TFP) otherwise. A gene was considered inconclusive (INC) if for any condition only one colony showed slow/no growth. A second round of screening was performed for TFN, TFP, and INC gene knockouts. TFP and INC gene knockouts were rescreened with two colonies each; TFN gene knockouts were rescreened with four colonies and their pellets were washed four times before transfer to minimal media instead of twice to ensure that growth was not due to contamination from trace amounts of rich media from the precultures. For TFP and INC gene knockouts, a gene was considered essential for a condition if at least one colony showed slow/no growth in the condition in both the first and second round screens. If an essential gene was predicted by the model to be non-essential in the experimental condition, then the gene was concluded to be a genuine false positive (FP) for that condition. For TFN gene knockouts, a gene was considered a genuine false negative (FN) if two or more colonies demonstrated normal growth in the secondary screen. Metabolic network reconstruction procedure The iJO1366 reconstruction was assembled by updating the iAF1260b E. coli metabolic reconstruction (Feist et al, 2010), an updated version of the iAF1260 reconstruction (Feist et al, 2007). iAF1260b contains six additional reactions that were not in iAF1260 (ALAt2rpp, ASPt2rpp, CITt3pp, DHORDfum, GLYt2rpp, and MALt3pp). A 96-step procedure for metabolic network reconstruction was recently published (Thiele and Palsson, 2010), and the appropriate steps were followed when adding new genes, reactions, and metabolites to form iJO1366. The reconstruction was assembled using the SimPheny (Genomatica Inc., San Diego, CA) software platform. All new metabolites were checked against public databases (KEGG, PubChem) for correct structure and charge at a pH of 7.2. New reactions were mass and charge balanced and reversibility was assigned based on experimental studies, thermodynamic information, or the heuristic rules in the standard reconstruction protocol (Thiele and Palsson, 2010). Reactions were associated with genes and functional proteins to form GPRs. The iJO1366 model was exported from SimPheny as an SBML file and the COBRA Toolbox (Becker et al, 2007), a Matlab (The MathWorks Inc., Natick, MA) Toolbox, was used for additional model testing. The Tomlab (Tomlab Optimization Inc., Seattle, WA) CPLEX linear programming solver was used for all optimization procedures. The GapFind MILP algorithm (Satish Kumar et al, 2007) was encoded in the COBRA Toolbox and used to identify all blocked metabolites in the iAF1260 and iJO1366 models. This algorithm was modified from the published version. Specifically, an option was included to change the mass balance constraint to , allowing for metabolites without consuming reactions to be identified as gaps. For each GapFind run, the lower bounds of all exchange reactions were set to −1000 mmol gDW−1 h−1 and the upper bounds of all model reactions were set to 109 mmol gDW−1 h−1. The GapFind algorithm was then run twice, once with each mass balance constraint option. Root no-production and no-consumption metabolites were identified from the model stoichiometric matrix (S) by searching for rows containing only negative or positive coefficients, respectively. Downstream no-production and upstream no-consumption gaps were identified by removing the root gaps from the GapFind outputs. The root gaps of each downstream gap were identified through targeted computational experiments in which metabolite source reactions were added to the network to restore connectivity. Orphan reactions were identified as all reactions without associated GPRs. The core and wild-type biomass reactions were modified from the iAF1260 biomass reactions. Biotin was added with a coefficient based on a published biotin concentration of 250 molecules per cell (Delli-Bovi et al, 2010), while the related cofactor lipoate was added with a similar number of molecules per cell assumed. Iron–sulfur clusters were added with coefficients based on the predictions that 5% of all E. coli proteins contain these clusters (Fontecave, 2006), and that the majority (90%) of these clusters are of the [4Fe–4S] type. The coefficient of Fe2+ was decreased to account for Fe used in iron–sulfur clusters. Molybdenum cofactors were added with coefficients based on the measurement that the inorganic ion content of E. coli is 0.80% Mo (Cvetkovic et al, 2010), and the fact that the majority of this Mo is in the bis-molybdopterin guanine dinucleotide form (85%). The coefficients of Cu, Mn, Zn, Ni, and Co were also adjusted based on recent in vivo measurements (Cvetkovic et al, 2010). All other biomass components remain the same as in iAF1260. Growth-associated and non-growth-associated maintenances values were recalculated based on recent E. coli K-12 MG1655 chemostat data for growth on glucose minimal media (Taymaz-Nikerel et al, 2010). The slope and intercept of this experimental data were identified by linear regression. For these calculations, the electron transport system NADH dehydrogenase reactions NADH16pp (nuo) and NADH5 (ndh) were constrained to carry identical fluxes by replacing these reactions with an equivalent ‘flux split' reaction, in order to constrain the model to a realistic P/O ratio of 1.375 (Noguchi et al, 2004). The NGAM of 3.15 mmol ATP gDW−1 h−1 was identified by FBA as the maximum amount of ATP produced at a glucose uptake rate of 0.17 mmol gDW−1 h−1, the intercept of the experimental data. The GAM of 53.95 mmol ATP gDW−1 was identified by FBA as the value that would give the correct experimentally determined slope of 10.83 mmol glucose gDW−1 h−1/(μ) h−1 when using the core biomass reaction. The iJO1366 model is available in SBML format at BioModels (accession: MODEL1108160000) and as Supplementary Model 1. Comparison of iJO1366 to the Model SEED E. coli reconstruction The E. coli K-12 MG1655 model Seed83333.1 V20.21 was downloaded from the Model SEED database (Henry et al, 2010) in SBML format. The set of 1139 genes in this model was compared by gene ID (b-number) to the 1366 genes in iJO1366 to identify common genes and the unique genes in each model. The genes in the Model SEED model that were not in iJO1366 were then investigated one at a time. EcoCyc was used to identify gene functions, and several genes with verified metabolic functions were then added to iJO1366. Comparison of iJO1366 to the EchoLocation database Predicted and experimentally determined protein location data were obtained for all E. coli K-12 genes from the EchoLocation database (Horler et al, 2009). The cellular locations of the proteins associated with the 1366 model genes from this database were then compared, one at a time, to the compartments of the metabolites in the reactions associated with each gene in iJO1366. For each location in the EchoLocation database, a Boolean rule was written to determine if the location is consistent with the associated model metabolites. For example, ‘Cytoplasmic' proteins are consistent with genes only associated with cytoplasmic metabolites, and are inconsistent with genes associated with any periplasmic or extracellular metabolites. See Supplementary Table 9 for the full list of Boolean rules. The genes whose locations were inconsistent with model metabolites were then investigated individually, except for ‘Cytoplasmic' and ‘Periplasmic' genes with both cytoplasmic and periplasmic metabolites in the model, because there were too many of these inconsistencies to investigate manually. Literature and database information was used to determine whether the EchoLocation database, the iJO1366 reconstruction, or both were correct. Constraint-based modeling The iJO1366 model, constructed in SimPheny, was exported as an SBML file and used to perform simulations and constraint-based analyses using the COBRA Toolbox and Tomlab CPLEX linear programming solver. The constraint-based model consists of an S matrix with 1805 rows and 2583 columns, where 1805 is the number of distinct metabolites (in all three compartments) and 2583 is the number of reactions including exchange and biomass reactions. Each of the reactions has an upper and lower bound on the flux it can carry. Reversible reactions have an upper bound of 1000 mmol gDW−1 h−1 and a lower bound of −1000 mmol gDW−1 h−1, making them practically unconstrained, while irreversible reactions have a lower bound of zero. By default, the core biomass reaction is set as the objective to be maximized. Certain reactions are by default constrained to carry zero flux to avoid unrealistic behaviors. These reactions are CAT, DHPTDNR, DHPTDNRN, FHL (formate hydrogen lyase), SPODM, SPODMpp, SUCASPtpp, SUCFUMtpp, SUCMALtpp, and SUCTARTtpp. CAT, SPODM, and SPODMpp are hydrogen peroxide producing and consuming reactions that can carry flux in unrealistic energy generating loops. DHPTDNR and DHPTDNRN form a closed loop that can carry an arbitrarily high flux. The succinate antiporters SUCASPtpp, SUCFUMtpp, SUCMALtpp, and SUCTARTtpp can form unrealistic flux loops with other transporters for aspartate, fumarate, malate, and tartrate. The genes encoding FHL are known to be active under anaerobic conditions, but this reaction is constrained to zero to avoid unrealistic aerobic hydrogen production. The NGAM constraint is imposed by a lower bound of 3.15 mmol gDW−1 h−1 on the reaction ATPM. The exchange reactions that allow for extracellular metabolites to pass in and out of the system are defined such that a positive flux indicates flow out. All exchange reactions have a lower bound of zero except for glucose (−10 mmol gDW−1 h−1), the vitamin B12 precursor cob(I)alamin (−0.01 mmol gDW−1 h−1), and oxygen and all inorganic ions required by the biomass reaction (−1000 mmol gDW−1 h−1). The default lower bound on glucose uptake is based on typical glucose uptake rates. Because only a very small amount of B12 is required for growth, the lower bound on cob(I)alamin uptake is arbitrary and never actually constraining in practice. The iJO1366 computational model also includes drain reactions for six cytoplasmic metabolites without known consuming reactions that must be drained from the system to allow simulation of steady-state cell growth. These metabolites are p-cresol, 5′-deoxyribose, aminoacetaldehyde, S-adenosyl-4-methylthio-2-oxobutanoate, (2R,4S)-2-methyl-2,3,3,4-tetrahydroxytetrahydrofuran, and oxamate. Prediction of all growth-supporting carbon, nitrogen, phosphorus, and sulfur sources The possible growth-supporting carbon, nitrogen, phosphorus, and sulfur sources of E. coli were identified using FBA. First, all exchange reactions for extracellular metabolites containing the four elements were identified from the metabolite formulas. Every extracellular compound containing carbon was considered a potential carbon source, for example. Next, to determine possible growth-supporting carbon sources, the lower bound of the glucose exchange reaction was constrained to zero. Then the lower bound of each carbon exchange reaction was set, one at a time, to −10 mmol gDW−1 h−1 (a typical uptake rate for growth-supporting substrates), and growth was maximized by FBA using the core biomass reaction. The target substrate was considered growth supporting if the predicted growth rate was above zero. While identifying carbon sources, the default nitrogen, phosphorus, and sulfur sources were ammonium (nh4), inorganic phosphate (pi), and inorganic sulfate (so4), respectively. Prediction of growth-supporting sources of these other three elements was performed in the same manner as growth on carbon, with glucose as the default carbon source. Prediction of gene essentiality To simulate the effects of gene knockouts, the iJO1366 model with its default constraints and core biomass reaction objective was modified to match the genotype of E. coli BW25113 (see Experimental phenotypic screens). For growth on glucose, the lower bound of the glucose exchange reaction was set to −10 mmol gDW−1 h−1. For growth on glycerol, the lower bound of the glucose exchange reaction was set to zero while the lower bound of the glycerol exchange reaction was set to −10 mmol gDW−1 h−1. All 1366 genes in the model were knocked out one a time and growth was simulated by FBA using the singleGeneDeletion COBRA Toolbox function. Gene knockout strains with a growth rate above zero were considered non-essential. Experimental gene essentiality data for growth on glucose (Baba et al, 2006) and glycerol (Joyce et al, 2006) was then obtained and adjusted based on an updated analysis of the Keio Collection strains (Yamamoto et al, 2009). The newly identified essential genes were added to the lists of essential genes under both conditions, while the genes whose essentiality was identified as uncertain were not changed from their original designations. Mapping iJO1366 to closely related strains The protein sequences of all available E. coli and Shigella strains (38 strains total) were downloaded from KEGG (http://www.genome.jp/kegg/download), and SSEARCH35 of the FASTA suite (Pearson and Lipman, 1988), an open-source implementation of the Smith–Waterman algorithm, was used to determine a PID conservation for each iJO1366 gene. The flags used in SSEARCH35 were ‘–m9 –E 1 –q –H'. Next, the deleteModelGenes COBRA Toolbox function was used to delete all genes in a strain that failed to be conserved at a particular protein sequence identity threshold. The entire cutoff range was scanned (from 0 to 100% identity in increments of 1%) and FBA was performed with an objective function of maximum growth rate using the core biomass reaction, to produce Figure 2A. Subsequently, the situation occurring at 40% identity was investigated to determine why four of the strains were unable to produce biomass at this conservation threshold. A demand reaction for each biomass component was added (excluding the components involved in the ATP hydrolysis reaction) and flux was separately maximized through each of these reactions to determine the biomass components that the strain was unable to produce. To determine the subset of genes responsible for a particular loss of function, the effects of single knockouts were computed and literature was consulted. The preliminary network-level analysis of conservation revealed that many ORFs are not properly automatically annotated in non-model organism genomes. For this reason, this analysis was supplemented with a nucleotide-level search for conservation of each model gene in each of the 38 strains. The analysis described above was repeated, this time with the additional conservation qualification of a genomic hit displaying at least 70% identity and aligned length to the model gene. These cutoffs were determined by manual inspection of a scatter plot of all nucleotide-level results. There was a clear region of conservation in the upper right quadrant defined by these cutoffs. Genes found to be conserved only at the nucleotide level are considered to have 100% protein sequence identity (see Supplementary Table 14 for a complete list of such cases). Supplementary Material Supplementary Information Supplementary Text and Figures Supplementary Tables Supplementary Tables 1–14 Supplementary Model 1 iJO1366 model in SBML format Review Process File
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR)

              Background Escherichia coli is perhaps the best characterized and studied bacterium and is of interest industrially, genetically and pathologically. For these reasons, in silico modeling efforts have been made to describe and predict its cellular behavior. With the vast amounts of '-omics' data that are being generated, there is a growing need for incorporating and reconciling heterogeneous datasets, including genomic, transcriptomic, proteomic and metabolomic data [1]. A constraint-based model of E. coli metabolism can accomplish this and serve as a model centric database. In addition to providing the context for various '-omics' data types, constraint-based models provide a framework to compute cellular functions [2]. This modeling method finds the limits of cellular, biochemical and systemic functions, thereby identifying all allowable solutions. Searches within the allowable solution space can identify solutions of interest, for example a solution that maximizes a particular objective. This approach to genome-scale model building has been reviewed in detail [3-6]. In general, the application of successive constraints (stoichiometric, thermodynamic and enzyme capacity constraints), with respect to the metabolic network, restricts the number of possible solutions. Linear optimization is often used to find a particular solution in the allowable solution space that maximizes a chosen objective function, such as cellular growth (Figure 1). A more detailed description of the constraint-based modeling approach can be found in Materials and methods. The constraint-based modeling approach has been used to study E. coli metabolism for over ten years; the history of such model building efforts has recently been reviewed [7]. The first genome-scale metabolic (GSM) model accounting for 660 gene products (iJE660 GSM) was reconstructed using genomic information, biochemical data and physiological data [8]. This genome-scale model has been used to perform in silico gene deletion studies [8] and to predict both optimal growth behavior [9] and the outcome of adaptive evolution [10]. This paper reports an expansion of iJE660a GSM, which itself is a slight modification of the original genome-scale metabolic model (iJE660 GSM) [8]. Gene to protein to reaction (GPR) associations are included directly in the new model (iJR904 GSM/GPR). These associations describe the dependence of reactions on proteins and proteins on genes (Figure 2). The metabolic network described by iJR904 has also changed; individual reactions are now elementally and charge balanced, and a significant number of new genes and novel reactions have been added to the model. iJR904 GSM/GPR accounts for over 904 genes and the 931 unique biochemical reactions the encoded proteins carry out. This paper discusses the effects that these additional reactions have on the predictive capabilities of the model and identifies putative ORFs in the genome which could resolve gaps in the metabolic network. Since computational models of E. coli will continue to grow in size and scope [7] it will become important to be able to distinguish between the different models - a naming convention will aid in this effort. The naming convention we chose to use mirrors the one already established for plasmids. The general form of the names of in silico strains used is iXXxxxa YYY. The 'i' in the name refers to an in silico model (that is, a computer model). This 'i' is followed by the initials (XX) of the person who developed the model and then the number of genes (xxx) included in the model. Any letters (a) after the number of genes indicates that slight modifications were made to the model, for instance iJE660a is derived from iJE660. Further designation of the content and scope of a model are found in YYY; here the acronyms GSM and GPR stand for genome-scale model and gene-protein-reaction associations, respectively. The contents of iJE660a and iJR904 can be found on our website [11], and iJR904 is also detailed in the additional data files. Results and discussion Properties of the iJR904 metabolic network An update on the annotation of the E. coli K-12 genome was published in 2001 [12], facilitating the process of updating the genome-scale in silico E. coli model (iJE660a GSM). Genes encoding known or putative enzymes and transporters not included in the iJE660a model were further examined. Literature and database searches (LIGAND [13], EcoCyc [14] and TC-DB [15]) on each of the genes provided the biochemical information needed to expand E. coli iJE660a. The iJR904 model was built using the software SimPheny™ (Genomatica, San Diego, CA) and accounts for 904 genes with a known locus in the genome, as compared to 660 genes in the previous model. The metabolic network described by E. coli iJE660a has expanded in size from 627 unique reactions and 438 metabolites to 931 unique reactions and 625 metabolites in iJR904. Complete maps containing all the reactions in the metabolic network are available in the additional data files and can also be downloaded from [11]. The molecular formulae and charges for the metabolites in the model were determined assuming a pH of 7.2. Fifty-eight of the reactions in iJR904 currently do not have associated genes. A complete list of the reactions can be found in the additional data files. Putative functional gene assignments account for 23 of the added reactions, with the majority of these being putative transporters. In addition to these new reactions, old reactions were updated to be both elementally and charge balanced by including water and protons as participants in the reactions. Six reactions in iJR904 are elementally balanced but not charge balanced (Table 1). Five out of the six reactions are imbalanced because they have not been fully characterized biochemically so assumptions had to be made about the participating metabolites; the remaining reaction (abbreviated ADOCBLS) is charge imbalanced since we were unable to determine the charge associated with the ion complex. The biomass reaction [8], representing the drain of biosynthetic constituents from the network and the growth-associated ATP requirement, was also changed to include internal protons and water. The amount of water needed in the biomass reaction is equal to the amount of ATP hydrolyzed to meet the growth-associated ATP requirement. The hydrolysis of ATP results in the production of a proton while the utilization of NADPH and NADH consumes a proton; this results in the net production of protons in the biomass reaction. Other updates to the iJE660a reaction network are also notable. A number of reactions in iJE660a could not previously be assigned to an ORF, but are now assigned to an ORF as a result of the updated genome annotation [12], such as mtn and uppS. Other ORF names have changed and these were also updated. Some of the original model reactions were modified in addition to including internal protons and water. These modifications mainly included changes in the stoichiometric coefficients, cofactor usage and reaction reversibility. In some cases, the metabolites that participate in the reaction were changed. Forty-two reactions were removed from iJE660a, and these are listed in the additional data files along with the reasons for their removal. iJR904 also accounts for the specificity of the quinones in the individual reactions involved in the electron transport chain (Figure 3). E. coli K-12 uses three quinones: ubiquinone (Q), menaquinone (MK) and demethylmenaquinone (DMK) to transfer electrons from the electron donor to the terminal acceptor; in iJE660a there was no distinction between the three quinones and they were all treated as ubiquinone, which led to inaccurate electron donor/electron acceptor pairs. GPR associations are for the first time directly included in the iJR904 model, and examples of some are shown in Figure 2. GPR associations have been constructed and their images can be found in the additional data files. These GPR associations can be used to evaluate the reactions remaining in the metabolic network after deletion of a specific gene. Including these associations directly into iJR904 will lead to a more accurate assessment of the effects of gene deletions. In addition, these GPR associations are necessary for analyzing diverse datasets by the model and using these datasets to further identify physiological states. Thus, iJR904 accounts more accurately for a number of the metabolic processes in E. coli K-12 MG1655, and expands in scope significantly through the addition of GPR associations. E. coli iJR904 GSM/GPR is no longer a purely metabolic model. Systemic properties A list of network components alone does not explain how the components work together to produce a biological function. These systemic properties can only be investigated when considering all the components simultaneously; it is the interaction of these components that provides the most information about cellular behavior. As with iJE660a, a myriad of other issues can be addressed with iJR904. We will address three types of issues in this paper using iJR904: gap analysis and putative ORF assignments, the importance of global proton balancing, and phase plane analysis [16]. Identification and resolution of dead ends A 'dead end' exists in a metabolic network if a metabolite is either only produced or only consumed in the network. If a metabolic network contains a gap, it is missing the biochemical reactions that can produce or consume the dead end metabolites. iJR904 has 70 dead end metabolites or gaps in the network; these are listed in the additional data files. These 70 metabolites participate in 89 reactions, indicating that at least 89 model reactions can never be used if the network is to operate at steady state. The reactions that lead up to the dead ends in iJR904 are included so that when the gaps are filled in at a future date, the network will be fully functional. Some of these network gaps could be reconciled by the addition of transporters, while others could be reconciled by modifying the growth function, that is, including dead end metabolites as requirements for biomass production. Neither of these steps was taken here since genomic or biochemical evidence could not be found to support their inclusion. Using these gaps, we attempted to identify new functional assignments based on sequence homology searches. A list of EC numbers corresponding to enzymes (not included in the model) that could resolve the gaps, was generated. This list was pooled together with the enzymes known to occur in E. coli which lack assigned loci (EcoCyc [14]). We subsequently collected amino acid sequences from other organisms (orthologous sequences) assigned to these enzymatic functions for a homology search study. A total of 83 training sets, each with an average of 11 orthologous sequences, was collected and compared against the E. coli genome. Each training set includes multiple orthologous sequences that correspond to an enzyme of interest. Of the 83 training sets, 61 are for enzymes which connect to gaps in the existing network and the remaining 22 are for some of the enzymes listed in EcoCyc. Each training set was processed using the alignment programs MEME [17] and ClustalW [18] to generate a profile for the corresponding enzyme. Using these profiles MAST [19] and HMMER [20] were used to identify similar ORFs in the E. coli genome. Of the 61 enzymes that could resolve network gaps, we assigned putative loci for 12 of them. In addition, putative loci could be found for 15 of the 22 enzymes listed in EcoCyc. Some of these enzymes have multiple matches within the E. coli genome, which together resulted in 55 putative assignments. These results were inspected manually and found to be relevant and consistent across the three search methods (MAST, with and without end-gap penalty, and HMMER). The results of this study largely coincided with the annotation performed by Serres et al. [12]; however, most annotation updates added more specificity to the type of reactions the enzymes were predicted to catalyze and, in some cases, suggested additional substrates that known enzymes might act upon (Table 2). Table 2 lists the top three matches for each enzyme (except for the case when the match is a known isozyme); a complete list of all results and expected values (e-values) can be found in the additional data files. The putative assignments presented in Table 2 should be used with care. For example, there are multiple putative assignments for the acyl-CoA dehydrogenase enzyme for which the gene has recently been found [21], but the actual gene locus for this enzyme (b0221) does not have the most significant e-value among the list of potential loci. Effects of constraining proton exchange flux The importance of balancing protons internally can now be investigated with iJR904 since the metabolic network accounts for all the protons being generated and consumed by the individual metabolic and transport reactions (only external protons associated with the proton motive force were accounted for in iJE660a). The medium can serve as a pool both supplying and dissipating external protons as needed by the cell. During growth on some carbon sources, the generation of internal protons by the metabolic reactions is relieved by secreting protons into the medium. This subsequently reduces the amount of ATP made by ATPase since these protons could be used to drive this reaction. Under other conditions, a shortage of internal protons is compensated for by taking up protons from the medium and transporting them across the cell membrane into the cytosol. The effects that the exchange of protons across the system boundary have on predicted growth rates were investigated. A robustness analysis [22], in which the flux through the proton exchange reaction was constrained from its optimal value down to zero, was performed under aerobic conditions for a variety of carbon sources (Figure 4). The predicted growth rates for different carbon sources respond differently as the exchange flux of protons (between the cell and the medium) is reduced to zero; glucose and glycerol were the most sensitive to proton exchange while D- and L-lactate were the least sensitive. When either glucose or glycerol was used as the carbon source, excess protons were generated intracellularly; this excess was relieved by secreting extracellular protons, thereby lowering the pH of the medium. For pyruvate, D- and L-lactate, acetate, α-ketoglutarate (αKG), succinate and malate there is a shortage of internal protons; as a result, cells would uptake protons from the medium thereby raising the pH. Since there can be no net accumulation of charge within the system, the total charge entering the system must equal the charge leaving the system. Pyruvate, lactate, acetate, α-ketoglutarate, succinate and malate, as used in these simulations, have a negative charge so H+ must be taken up; however, if the uncharged acidic form of these compounds was used instead (and the acid-base reaction was included in the network - generating proton(s) and the basic form of the compound) it would be predicted that H+ would be secreted by the cell thereby lowering the pH. It is generally thought that the pH of the medium becomes more acidic as E. coli grows. A lowering of pH has been observed for growth on unbuffered medium with glycerol; however, during growth on unbuffered medium with disodium succinate or sodium acetate the medium becomes more basic (J.L.R. and B.O.P., unpublished data). Clearly iJR904 with charge-balanced reactions highlights the challenge that cells have with globally balancing protons. As a part of the iterative model building process [2]iJR904 can be used to design informative experiments to systematically address this issue. Phenotypic phase plane (PhPP) comparisons The phenotypic phase planes [16] for growth on different carbon sources were calculated using both models iJR904 and iJE660a. A description of phenotypic phase planes can be found in Materials and methods. For these simulations the carbon uptake rate and oxygen uptake rate were varied. The carbon substrates tested included: glucose, pyruvate, acetate, glycerol, D-lactate, αKG, succinate and malate. The phase planes for pyruvate, D-lactate and succinate calculated using iJR904 were nearly identical to those calculated with model iJE660a (the lines demarcating the different phases only moved slightly). These results show that the modified and new reactions contained in iJR904 did not significantly affect the phase planes for these substrates, indicating that iJR904 makes similar predictions regarding optimal growth on these substrates. The line of optimality (LO) on a phenotypic phase plane corresponds to the conditions (oxygen and substrate uptake rates) which can maximize the biomass yield. The largest shifts in the line of optimality were observed for growth on pyruvate and αKG. However, these shifts are relatively small indicating that the optimal oxygen uptake and carbon source uptake rates needed to generate the maximal amount of biomass do not change significantly (less than 10%). The phenotypic phase planes for carbon sources other than pyruvate, D-lactate, and succinate have more significant changes when calculated with iJR904 as compared to iJE660a. These include growth on the following carbon sources: glycerol, glucose, acetate, malate and αKG. The resulting phase planes calculated using iJR904 and iJE660a are shown in red and blue, respectively, in Figure 5a-e. For the malate and glucose phase planes (Figure 5a,b) one of the lines only appears on the phase plane calculated using iJE660a; these changes are attributed to the effects of global proton balancing described in the previous section. This issue also accounts for changes in the acetate phase plane (Figure 5c) and is a contributing factor to the changes observed with the glycerol and αKG phase planes. However, other changes (discussed below) to the metabolic networks have more significant effects on these two phase planes. Glycerol phase plane The changes in the glycerol phenotypic phase plane (Figure 5d) were less drastic than those for the αKG phase plane (Figure 5e, see comments below). The only major change was that the lines in the microaerobic region shifted downward. This change is a result of the removal of two reactions involved in pyrridoxal recycling which were previously assigned to pdxH. The two reactions are not included in the metabolic network defined by iJR904 for reasons explained in the additional data files. α-Ketoglutarate phase plane The two-dimensional phenotypic phase plane for αKG is noticeably different when calculated for iJR904 and iJE660a (Figure 5e). One notable feature is that iJR904 predicts completely anaerobic growth on αKG, while iJE660a predicts that oxygen is required for growth on αKG. Under oxygen limitations, corresponding to regions below the line of optimality, the expanded model is also more efficient at producing biomass from αKG than the previous model. As oxygen becomes more limiting, iJR904 becomes increasingly more efficient at generating biomass than iJE660a. Examination of the calculated optimal flux distributions provides insights into why iJR904 is more efficient. One of the reasons for this increased growth efficiency is that iJR904 includes the citrate lyase enzyme, which converts citrate (CIT) to acetate (AC) and oxaloacetate (OAA). During oxygen-limited growth, iJR904 predicts that some of the αKG is converted to OAA by first reversing some of the TCA cycle reactions to generate CIT and then splitting CIT into AC and OAA by citrate lyase. The rest of the αKG is consumed through the forward reactions of the TCA cycle to produce malate (Figure 6). Removal of the citrate lyase reaction from the network under anaerobic conditions shows two other, less-efficient routes that would still enable iJR904 to predict anaerobic growth. These new metabolic routes, which are dependent on at least one of the new additions, are depicted in Figure 5f. Conclusions This paper reports the curation and expansion of a previous genome-scale constraint-based model of E. coli metabolism (iJE660a GSM) that is now used in multiple laboratories (A.L. Barabasi, personal communication; Church and colleagues [23]; H. Greenberg, personal communication and C. Maranas, personal communication). This expanded model, iJR904 GSM/GPR, includes 37% more metabolic genes and 47% more metabolic reactions. Each reaction in the network is now both elementally and charge balanced with the exception of the six reactions listed in Table 1. While the new reactions added to the network do not change many of the predicted optimal phenotypes, there are instances in which the expanded model makes significantly different predictions, examples of which occur when glycerol, glucose, malate, acetate and αKG are used as the carbon sources under oxygen-limited conditions. The analysis of dead ends or gaps in the network has led to putative annotations of 55 ORFs. If these putative functional assignments are verified biochemically they can be included in future updates of iJR904. Ideally an iterative process will be developed, within which the model can help identify new targets, and, if verified, can lead to an updated model. This iterative process [2] would be likely to produce more useful results in less-characterized organisms and has already been successful in helping to identify malate dehydrogenase in Helicobacter pylori [24] and citrate synthase in Geobacter sulfurreducens (D. Lovley, unpublished results). The incorporation of GPR associations into iJR904 will allow for the analysis of transcriptomic and proteomic data directly; it also enables the incorporation of these datasets to further constrain the solution space leading to more accurate predictions of phenotypic data. E. coli iJR904 can now serve as a model centric database which could analyze and reconcile heterogeneous datasets as well as use these datasets to aid in model predictions. Materials and methods Constraint-based modeling A stoichiometric matrix, S (m × n), is constructed where m is the number of metabolites and n the number of reactions. Each column of S specifies the stoichiometry of the metabolites in a given reaction from the metabolic network. Mass balance equations can be written for each metabolite by taking the dot product of a row in S, corresponding to a particular metabolite, and a vector, v , containing the values of the fluxes through all reactions in the network. A system of mass balance equations for all the metabolites can be represented as follows: where X is a concentration vector of length m, and v is a flux vector of length n. At steady-state, the time derivatives of metabolite concentrations are zero, and equation (1) can be simplified to: S • v = 0 It follows that in order for a flux vector v to satisfy this relationship, the rate of production must equal the rate of consumption for each metabolite. Application of additional constraints further reduces the number of allowable flux distributions, v . Limits on the range of individual flux values can further reduce the number of allowable solutions. These constraints have the form: α ≤ v i≤ β where α and β are the lower and upper limits, respectively. Maximum flux values (β) can be estimated based on enzymatic capacity limitations or, for the case of exchange reactions, measured maximal uptake rates can be used. Thermodynamic constraints, regarding the reversibility or irreversibility of a reaction, can be applied by setting the α for the corresponding flux to zero if the reaction is irreversible. These constraints are not sufficient to shrink the original solution space to a single solution. Instead a number of solutions remain which make up the allowable solution space. Linear optimization can be used to find the solution that maximizes a particular objective function. Some examples of objective functions include the production of ATP, NADH, NADPH or a particular metabolite. An objective function with a combination of the metabolic precursors, energy and redox potential required for the production of biomass has proven useful in predicting in vivo cellular behavior [9,10,25,26]. Simulation conditions Simulations with iJR904 were all done using the software package SimPheny™ (Genomatica, San Diego, CA); this software was also used to build iJR904. All calculations were made using the conditions outlined in this section. The biomass reaction was the same as that reported previously [8] with the addition of intracellular protons and water, and can be found in the additional data files. All flux values reported in this section are in units of mmol/g DW-hr. The flux through the non-growth associated ATP maintenance reaction (ATPM in the additional data files) was fixed to 7.6. Fluxes through all other internal reactions have an upper limit of 1 × 1030; if the reaction is reversible the lower limit is -1 × 1030 and if it is irreversible the lower limit is zero. In addition to the metabolic reactions listed in the additional data files, reversible exchange reactions for all external metabolites were also included in the simulations to allow external metabolites to cross the system boundaries. If these exchange reactions are used in the forward direction the external metabolites leave the system and if used in the reverse direction (that is, a negative flux value through the reaction) the external metabolites enter the system. The following external metabolites were allowed to freely enter and leave the system: ammonia, water, phosphate, sulfate, potassium, sodium, iron (II), carbon dioxide and protons (except during the robustness study where proton exchange flux was constrained down to zero). The corresponding exchange fluxes for these metabolites have a lower and upper flux limit of -1 × 1030 and 1 × 1030, respectively. Aerobic conditions were simulated with a maximum oxygen uptake rate of 20 mmol/g DW-hr, by setting the lower and upper limits for the oxygen exchange flux to -20 and 0 respectively, and anaerobic conditions were simulated by fixing the oxygen uptake rate to 0. All other external metabolites, except for the carbon source, were only allowed to leave the system. The lower and upper limits on their corresponding exchange fluxes were 0 and 1 × 1030, respectively. Growth on different carbon sources was simulated by allowing those external metabolites to enter the system; the actual flux values for uptake rates used in the simulations are noted in the text and figures, where the upper limit is 0 and the lower limit is the negative of the uptake rate listed. These constraints are also summarized in the additional data files. Phenotypic phase planes (PhPP) Phenotypic phase plane analysis was developed to generate a global view of the optimality properties of a network [16]. The phenotypic phase plane is constructed from a large number of individual optimal solutions and gives an overall view of the optimality properties of the network. PhPPs are used to show all possible quantitative flux distributions through a network while varying two or three constrained fluxes. The different regions of the phase plane have qualitatively different flux distributions that translate to different metabolic phenotypes. One important feature of the PhPP is the line of optimality (LO). Points that lie on the LO optimize the objective function for a given substrate uptake rate. When the cell is operating along the LO, calculated when the growth rate is used as the objective function, the cell is growing with a maximal biomass yield. Identification of dead ends Dead end metabolites are classified as such if a metabolite can either be produced but not consumed or consumed but not produced. By examining the connectivity of the metabolites in the S matrix, a list of dead ends can be generated. Once these are identified, the number of reactions directly involved with these metabolites can be determined by enumerating the number of non-zero elements in the row corresponding to each dead end metabolite. Sequence annotations A list of reactions and associated enzymes that could connect the dead ends with the rest of the network was gathered from LIGAND (Database of Chemical Compounds and Reactions in Biological Pathways [13]). These are enzymes known to act on those metabolites in other organisms. The enzymes listed in EcoCyc which are known to be in E. coli but lack assigned loci, were also added to the search list. The enzyme commission numbers for this combined list of enzymes were used in queries against the SwissProt, TremBl and TremBlnew databanks (using Sequence Retrieval System [27]) to retrieve the enzymes' corresponding amino acid sequences. Known orthologous sequences of all of the enzymes of interest were grouped together to construct multiple sequence alignments. Two separate programs, MEME (Multiple Expectation maximization for Motifs Elicitation [17]) and ClustalW [18] were used for this purpose. MEME was run assuming that each sequence may contain a variable number of non-overlapping occurrences of each motif and up to three distinct motifs. Each training set was processed twice with MEME, once with the program's default end-gap penalty and once without it. Default values were used for gap-opening penalty and gap-extension penalty. All sequences in each training set were weighed equally by MEME. ClustalW, however, down-weighed similar sequences in proportion to their degree of relatedness. Pair-wise alignments in ClustalW were run with a dynamic programming algorithm (slow option) and with the Reset Gap option off. The MEME output files were submitted to MAST [19] (Motif Alignment and Search Tool, version 3.0 online) and searched for matching sequences against the E. coli genome. MAST returned a list of high-scoring sequences and their annotations. ClustalW output files were used by HMMER's hmmbuild and hmmcalibrate (version 2.2g [20]) with default parameters to train profile HMMs (Hidden Markov Models). hmmsearch was subsequently applied to find sequences in the E. coli genome that matched each profile. Results from MAST's and HMMER's searches were manually examined, and relevant matches were reported. Additional data files The additional data consists of three files. The Excel file (Additional data file 1) contains the following: a list of the reactions in iJR904; definitions of the metabolite abbreviations; a list of the exchange fluxes used in simulations and their constraints; a list of the dead end metabolites in the metabolic network; a list of the reactions that were not included from iJE660a and a complete list of the sequence comparison results (including e-values). A zip file (Additional data file 2) is also provided, including the JPEG images of all the GPR associations and a detailed document describing how to interpret these images. The GPR image files are each labeled to correspond to either an individual gene or reaction. The third file (Additional data file 3) contains six maps of metabolism which together include all the reactions in iJR904; the reaction and metabolite abbreviations are the same as those listed in the Excel file. Supplementary Material Additional data file 1 A list of the reactions in iJR904; definitions of the metabolite abbreviations; a list of the exchange fluxes used in simulations and their constraints; a list of the dead end metabolites in the metabolic network; a list of the reactions that were not included from iJE660a and a complete list of the sequence comparison results (including e-values) Click here for additional data file Additional data file 2 The JPEG images of all the GPR associations and a detailed document describing how to interpret these images Click here for additional data file Additional data file 3 Six maps of metabolism which together include all the reactions in iJR904 Click here for additional data file
                Bookmark

                Author and article information

                Journal
                Microbiology
                Microbiology (Reading, Engl.)
                micro
                micro
                Microbiology
                Microbiology Society
                1350-0872
                1465-2080
                February 2020
                28 February 2020
                28 February 2020
                : 166
                : 2
                : 93-95
                Affiliations
                [ 1] departmentDepartment of Biology , University of York , YO10 5YW, UK
                Author notes
                *Correspondence: Gavin H. Thomas, gavin.thomas@ 123456york.ac.uk
                Author information
                https://orcid.org/0000-0002-9763-1313
                Article
                000901
                10.1099/mic.0.000901
                7398560
                32122459
                1e4ef450-b4f6-4e18-aeb7-3737ee6a3072
                © 2020 The Authors

                This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.

                History
                : 11 February 2020
                Categories
                Editorial

                Microbiology & Virology
                Microbiology & Virology

                Comments

                Comment on this article