14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Deciphering eukaryotic gene-regulatory logic with 100 million random promoters

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity, and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.

          Editorial summary

          Gene expression levels in yeast are predicted using a massive dataset on promoters with random sequences.

          Related collections

          Most cited references45

          • Record: found
          • Abstract: found
          • Article: not found

          Global mapping of protein-DNA interactions in vivo by digital genomic footprinting

          The orchestrated binding of transcriptional activators and repressors to specific DNA sequences in the context of chromatin defines the regulatory program of eukaryotic genomes. We developed a digital approach to assay regulatory protein occupancy on genomic DNA in vivo by dense mapping of individual DNase I cleavages from intact nuclei using massively parallel DNA sequencing. Analysis of > 23 million cleavages across the Saccharomyces cerevisiae genome revealed thousands of protected regulatory protein footprints, enabling de novo derivation of factor binding motifs as well as the identification of hundreds of novel binding sites for major regulators. We observed striking correspondence between nucleotide-level DNase I cleavage patterns and protein-DNA interactions determined by crystallography. The data also yielded a detailed view of larger chromatin features including positioned nucleosomes flanking factor binding regions. Digital genomic footprinting provides a powerful approach to delineate the cis-regulatory framework of any organism with an available genome sequence.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Predicting gene expression from sequence.

            We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Mechanisms that specify promoter nucleosome location and identity.

              The chromatin architecture of eukaryotic gene promoters is generally characterized by a nucleosome-free region (NFR) flanked by at least one H2A.Z variant nucleosome. Computational predictions of nucleosome positions based on thermodynamic properties of DNA-histone interactions have met with limited success. Here we show that the action of the essential RSC remodeling complex in S. cerevisiae helps explain the discrepancy between theory and experiment. In RSC-depleted cells, NFRs shrink such that the average positions of flanking nucleosomes move toward predicted sites. Nucleosome positioning at distinct subsets of promoters additionally requires the essential Myb family proteins Abf1 and Reb1, whose binding sites are enriched in NFRs. In contrast, H2A.Z deposition is dispensable for nucleosome positioning. By regulating H2A.Z deposition using a steroid-inducible protein splicing strategy, we show that NFR establishment is necessary for H2A.Z deposition. These studies suggest an ordered pathway for the assembly of promoter chromatin architecture.
                Bookmark

                Author and article information

                Journal
                9604648
                20305
                Nat Biotechnol
                Nat. Biotechnol.
                Nature biotechnology
                1087-0156
                1546-1696
                22 October 2019
                02 December 2019
                January 2020
                02 June 2020
                : 38
                : 1
                : 56-65
                Affiliations
                [1 ]Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
                [2 ]Howard Hughes Medical Institute and Koch Institute of Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02140 USA
                [3 ]School of Computer Science and Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
                [4 ]Initiative for Maximizing Student Development Program, University of New Mexico, Albuquerque, NM, USA
                Author notes

                Author Contributions

                C.G.D. and A.R. drafted the manuscript, with all authors contributing. C.G.D. analyzed the data. C.G.D., E.D.V., E.A., and R.S. performed the experiments. A.R. and N.F. supervised the research.

                [* ]To whom correspondence should be addressed: aregev@ 123456broadinstitute.org (AR) and carlgdeboer@ 123456gmail.com (CGD)
                Article
                NIHMS1541313
                10.1038/s41587-019-0315-8
                6954276
                31792407
                8e7e0a6f-ea16-4189-962b-93dd5e42ad96

                Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

                History
                Categories
                Article

                Biotechnology
                Biotechnology

                Comments

                Comment on this article