+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental validation of the performance of these methods at the genome scale has remained elusive. Here we assess the global performance of four existing classes of inference algorithms using 445 Escherichia coli Affymetrix arrays and 3,216 known E. coli regulatory interactions from RegulonDB. We also developed and applied the context likelihood of relatedness (CLR) algorithm, a novel extension of the relevance networks class of algorithms. CLR demonstrates an average precision gain of 36% relative to the next-best performing algorithm. At a 60% true positive rate, CLR identifies 1,079 regulatory interactions, of which 338 were in the previously known network and 741 were novel predictions. We tested the predicted interactions for three transcription factors with chromatin immunoprecipitation, confirming 21 novel interactions and verifying our RegulonDB-based performance estimates. CLR also identified a regulatory link providing central metabolic control of iron transport, which we confirmed with real-time quantitative PCR. The compendium of expression data compiled in this study, coupled with RegulonDB, provides a valuable model system for further improvement of network inference algorithms using experimental data.

          Author Summary

          Organisms can adapt to changing environments—becoming more virulent, for example, or activating stress responses—thanks to a flexible gene expression program controlled by the dynamic interactions of hundreds of transcriptional regulators. To unravel this regulatory complexity, multiple computational algorithms have been developed to analyze gene expression profiles and detect dependencies among genes over different conditions. It has been difficult to judge whether these algorithms can generate accurate global maps of regulatory interactions, however, because of the absence of a model organism with both a compendium of gene expression data and a corresponding network of experimentally determined regulatory interactions. To address this issue, we assembled 445 Escherichia coli microarrays, applied four classes of inference algorithms to the dataset, and validated the predictions against 3,216 experimentally determined E. coli interactions. The top-performing algorithm identifies 1,079 regulatory interactions at a confidence level of 60% or higher. Of these predicted interactions, 741 are novel and illuminate the regulation of amino acid biosynthesis, flagella biosynthesis, osmotic stress response, antibiotic resistance, and iron regulation. By defining the capabilities and limitations of network inference algorithms for large-scale mapping of prokaryotic regulatory networks, our work should facilitate their application to the mapping of novel microbes.


          A novel, machine-learning method is developed to predict transcriptional regulatory interactions, making use of microarray data. One interaction identified appears to be important for the control of iron transport.

          Related collections

          Most cited references 57

          • Record: found
          • Abstract: found
          • Article: not found

          One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products.

          We have developed a simple and highly efficient method to disrupt chromosomal genes in Escherichia coli in which PCR primers provide the homology to the targeted gene(s). In this procedure, recombination requires the phage lambda Red recombinase, which is synthesized under the control of an inducible promoter on an easily curable, low copy number plasmid. To demonstrate the utility of this approach, we generated PCR products by using primers with 36- to 50-nt extensions that are homologous to regions adjacent to the gene to be inactivated and template plasmids carrying antibiotic resistance genes that are flanked by FRT (FLP recognition target) sites. By using the respective PCR products, we made 13 different disruptions of chromosomal genes. Mutants of the arcB, cyaA, lacZYA, ompR-envZ, phnR, pstB, pstCA, pstS, pstSCAB-phoU, recA, and torSTRCAD genes or operons were isolated as antibiotic-resistant colonies after the introduction into bacteria carrying a Red expression plasmid of synthetic (PCR-generated) DNA. The resistance genes were then eliminated by using a helper plasmid encoding the FLP recombinase which is also easily curable. This procedure should be widely useful, especially in genome analysis of E. coli and other bacteria because the procedure can be done in wild-type cells.
            • Record: found
            • Abstract: found
            • Article: not found

            Cluster analysis and display of genome-wide expression patterns.

            A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
              • Record: found
              • Abstract: found
              • Article: not found

              Summaries of Affymetrix GeneChip probe level data.

              High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11-20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike-in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.

                Author and article information

                [1 ] Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
                [2 ] Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
                [3 ] Boston University School of Medicine, Boston, Massachusetts, United States of America
                [4 ] Department of Computer and Systems Science A. Ruberti, University of Rome, La Sapienza, Rome, Italy
                [5 ] Cellicon Biotechnologies, Boston, Massachusetts, United States of America
                Johns Hopkins University, United States of America
                Author notes
                * To whom correspondence should be addressed. E-mail: tgardner@
                Role: Academic Editor
                PLoS Biol
                PLoS Biology
                Public Library of Science (San Francisco, USA )
                January 2007
                9 January 2007
                : 5
                : 1
                06-PLBI-RA-0740R3 plbi-05-01-08
                (Academic Editor)
                Copyright: © 2007 Faith et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                Pages: 13
                Research Article
                Computational Biology
                Computational Biology
                Genetics and Genomics
                E. Coli
                Custom metadata
                Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, et al. (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5(1): e8. doi: 10.1371/journal.pbio.0050008

                Life sciences


                Comment on this article