63
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predicting Spatial and Temporal Gene Expression Using an Integrative Model of Transcription Factor Occupancy and Chromatin State

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Precise patterns of spatial and temporal gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene's expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator binding and histone modification status in the vicinity of individual gene loci, at a genome-wide scale during Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor chromatin state alone can explain all gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a naïve model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95% for temporal predictions and 50% for spatial. While this is, to our knowledge, the first genome-wide approach to predict tissue-specific gene expression in metazoan development, our results suggest that integrative models of this type will become more prevalent in the future.

          Author Summary

          Development is a complex process in which a single cell gives rise to a multi-cellular organism comprised of diverse cell types and well-organized tissues. This transformation requires tightly coordinated expression, both spatially and temporally, of hundreds to thousands of genes specific to any given tissue. To orchestrate these patterns, gene expression is regulated at multiple steps, from TF binding to cis-regulatory modules, general transcription factor and RNA polymerase II recruitment to promoters, chromatin remodeling, and three-dimensional looping interactions. Despite this level of complexity, the regulation of gene expression is typically modeled in the context of transcription factor binding and a single enhancer's activity as this is where the majority of experimental data is available. Recent advances in the measurement of chromatin modifications and insulator binding during embryogenesis provide new datasets that can be used for modeling gene expression. Here we use a Bayesian approach to integrate all three levels of information to combine the activity of multiple regulatory elements into a single model of a gene's expression, implementing an expectation maximization strategy to overcome the problem of missing data. Importantly, while the data for histone modifications and insulator binding represents merged signals from all cells in the embryo, the model can extract cell type specific and stage-specific predictions on gene expression for hundreds of genes of unknown expression.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: not found

          ChIP-seq accurately predicts tissue-specific activity of enhancers.

          A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover because they are scattered among the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here we present the results of chromatin immunoprecipitation with the enhancer-associated protein p300 followed by massively parallel sequencing, and map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases demonstrated reproducible enhancer activity in the tissues that were predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities, and suggest that such data sets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Identification of functional elements and regulatory circuits by Drosophila modENCODE.

            To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Predicting gene expression from sequence.

              We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                December 2012
                December 2012
                6 December 2012
                : 8
                : 12
                : e1002798
                Affiliations
                [1 ]Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
                [2 ]Institute of Informatics, University of Warsaw, Warsaw, Poland
                Weizmann Institute of Science, Israel
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: YHL EEMF. Performed the experiments: YHL. Analyzed the data: BW ZXY. Wrote the paper: BW EEMF. Designed and implemented model inference: BW.

                Article
                PCOMPBIOL-D-12-00382
                10.1371/journal.pcbi.1002798
                3516547
                23236268
                490e44d3-20f8-40ed-9b36-48461a10faaf
                Copyright @ 2012

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 6 March 2012
                : 8 October 2012
                Page count
                Pages: 11
                Funding
                This work was supported by grants from ERASysBio (ModHeart) and Human frontiers science program (HFSP) to EEMF and the Foundation for Polish Science within Homing Plus grant (co-financed by the European Union) - European Regional Development Fund to BW. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Computational Biology
                Systems Biology
                Developmental Biology
                Cell Fate Determination
                Genomics
                Functional Genomics
                Genome Expression Analysis

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article