19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Defining the Plasticity of Transcription Factor Binding Sites by Deconstructing DNA Consensus Sequences: The PhoP-Binding Sites among Gamma/Enterobacteria

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Transcriptional regulators recognize specific DNA sequences. Because these sequences are embedded in the background of genomic DNA, it is hard to identify the key cis-regulatory elements that determine disparate patterns of gene expression. The detection of the intra- and inter-species differences among these sequences is crucial for understanding the molecular basis of both differential gene expression and evolution. Here, we address this problem by investigating the target promoters controlled by the DNA-binding PhoP protein, which governs virulence and Mg 2+ homeostasis in several bacterial species. PhoP is particularly interesting; it is highly conserved in different gamma/enterobacteria, regulating not only ancestral genes but also governing the expression of dozens of horizontally acquired genes that differ from species to species. Our approach consists of decomposing the DNA binding site sequences for a given regulator into families of motifs ( i.e., termed submotifs) using a machine learning method inspired by the “ Divide & Conquer” strategy. By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis. Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species. The high conservation of PhoP submotifs within gamma/enterobacteria, as well as the regulatory protein that recognizes them, suggests that the major cause of divergence between related species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

          Author Summary

          The diversity of life forms frequently results from small changes in the regulatory systems that control gene expression. These changes often occur in cis-elements relevant to transcriptional regulation that are difficult to discern, as they are short, and are embedded in a genomic background that does not play a direct role in gene expression, or that consists of disparate sequences such as those from horizontally acquired genes. We devised a machine-learning method that significantly improves the identification of these elements, uncovering families of binding site motifs ( i.e., “submotifs”), instead of a single consensus recognized by a transcriptional regulator. The method can also incorporate other cis-elements to fully describe promoter architectures. Far from being just a computational convenience, ChIP-chip and custom expression microarray experiments for the PhoP regulon validated the high conservation and modular evolution of submotifs throughout the gamma/enterobacteria. This suggests that the major cause of divergence between species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous and horizontally-acquired target genes, and/or to the uncovered promoter architectures governing the interaction between the regulator and the RNA polymerase.

          Related collections

          Most cited references55

          • Record: found
          • Abstract: found
          • Article: not found

          Cluster analysis and display of genome-wide expression patterns.

          A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Systematic determination of genetic network architecture.

            Technologies to measure whole-genome mRNA abundances and methods to organize and display such data are emerging as valuable tools for systems-level exploration of transcriptional regulatory networks. For instance, it has been shown that mRNA data from 118 genes, measured at several time points in the developing hindbrain of mice, can be hierarchically clustered into various patterns (or 'waves') whose members tend to participate in common processes. We have previously shown that hierarchical clustering can group together genes whose cis-regulatory elements are bound by the same proteins in vivo. Hierarchical clustering has also been used to organize genes into hierarchical dendograms on the basis of their expression across multiple growth conditions. The application of Fourier analysis to synchronized yeast mRNA expression data has identified cell-cycle periodic genes, many of which have expected cis-regulatory elements. Here we apply a systematic set of statistical algorithms, based on whole-genome mRNA data, partitional clustering and motif discovery, to identify transcriptional regulatory sub-networks in yeast-without any a priori knowledge of their structure or any assumptions about their dynamics. This approach uncovered new regulons (sets of co-regulated genes) and their putative cis-regulatory elements. We used statistical characterization of known regulons and motifs to derive criteria by which we infer the biological significance of newly discovered regulons and motifs. Our approach holds promise for the rapid elucidation of genetic network architecture in sequenced organisms in which little biology is known.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              DNA binding sites: representation and discovery.

              G Stormo (2000)
              The purpose of this article is to provide a brief history of the development and application of computer algorithms for the analysis and prediction of DNA binding sites. This problem can be conveniently divided into two subproblems. The first is, given a collection of known binding sites, develop a representation of those sites that can be used to search new sequences and reliably predict where additional binding sites occur. The second is, given a set of sequences known to contain binding sites for a common factor, but not knowing where the sites are, discover the location of the sites in each sequence and a representation for the specificity of the protein.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                July 2010
                July 2010
                22 July 2010
                : 6
                : 7
                : e1000862
                Affiliations
                [1 ]Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
                [2 ]Department of Psychiatry, Washington University School of Medicine, St. Louis, Missouri, United States of America
                [3 ]Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
                [4 ]Howard Hughes Medical Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
                University of British Columbia, Canada
                Author notes

                Conceived and designed the experiments: OH IZ. Performed the experiments: OH SYP IZ. Analyzed the data: OH HH IZ. Contributed reagents/materials/analysis tools: EAG. Wrote the paper: OH HH EAG IZ. Revised and made suggestions about the manuscript: HH EAG.

                Article
                09-PLCB-RA-1606R2
                10.1371/journal.pcbi.1000862
                2908699
                20661307
                2691620c-1bd7-4510-9b15-4210e7f5e7cf
                Harari et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 4 January 2010
                : 15 June 2010
                Page count
                Pages: 20
                Categories
                Research Article
                Computational Biology/Evolutionary Modeling
                Computational Biology/Genomics
                Computational Biology/Sequence Motif Analysis
                Computer Science
                Microbiology/Microbial Evolution and Genomics

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article