8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.

          Abstract

          Regulatory and coding regions of genes are shaped by evolution to control expression levels. Here, the authors use deep learning to identify rules controlling gene expression levels and suggest that all parts of the gene regulatory structure interact in this.

          Related collections

          Most cited references134

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

            We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Long Short-Term Memory

              Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
                Bookmark

                Author and article information

                Contributors
                aleksej.zelezniak@chalmers.se
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                1 December 2020
                1 December 2020
                2020
                : 11
                : 6141
                Affiliations
                [1 ]GRID grid.5371.0, ISNI 0000 0001 0775 6028, Department of Biology and Biological Engineering, , Chalmers University of Technology, ; Kemivägen 10, SE-412 96 Gothenburg, Sweden
                [2 ]GRID grid.5371.0, ISNI 0000 0001 0775 6028, Novo Nordisk Foundation Center for Biosustainability, , Chalmers University of Technology, ; Kemivägen 10, SE-412 96 Gothenburg, Sweden
                [3 ]GRID grid.5371.0, ISNI 0000 0001 0775 6028, Computer Science and Engineering, , Chalmers University of Technology, ; Kemivägen 10, SE-412 96 Gothenburg, Sweden
                [4 ]GRID grid.8761.8, ISNI 0000 0000 9919 9582, Department of Marine Sciences, , University of Gothenburg, ; Box 461, SE-405 30 Gothenburg, Sweden
                [5 ]Gothenburg Global Biodiversity Center (GGBC), Box 461, 40530 Gothenburg, Sweden
                [6 ]GRID grid.452834.c, Science for Life Laboratory, ; Tomtebodavägen 23a, SE-171 65 Stockholm, Sweden
                Author information
                http://orcid.org/0000-0002-7099-961X
                http://orcid.org/0000-0003-0991-9040
                http://orcid.org/0000-0001-6037-7019
                http://orcid.org/0000-0002-9502-9804
                http://orcid.org/0000-0002-9955-6003
                http://orcid.org/0000-0001-7989-696X
                http://orcid.org/0000-0002-3098-9441
                Article
                19921
                10.1038/s41467-020-19921-4
                7708451
                33262328
                dc88f71c-4c0c-4c52-b4b8-b8e8078b3d42
                © The Author(s) 2020

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 18 December 2019
                : 2 November 2020
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100009252, Science for Life Laboratory (SciLifeLab);
                Funded by: FundRef https://doi.org/10.13039/100010665, EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 Marie Skłodowska-Curie Actions (H2020 Excellent Science - Marie Skłodowska-Curie Actions);
                Award ID: 722 287
                Award Recipient :
                Categories
                Article
                Custom metadata
                © The Author(s) 2020

                Uncategorized
                gene regulatory networks,machine learning,synthetic biology
                Uncategorized
                gene regulatory networks, machine learning, synthetic biology

                Comments

                Comment on this article