14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Deep learning methodologies have revolutionized prediction in many fields and show potential to do the same in molecular biology and genetics. However, applying these methods in their current forms ignores evolutionary dependencies within biological systems and can result in false positives and spurious conclusions. We developed two approaches that account for evolutionary relatedness in machine learning models: ( i) gene-family–guided splitting and ( ii) ortholog contrasts. The first approach accounts for evolution by constraining model training and testing sets to include different gene families. The second approach uses evolutionarily informed comparisons between orthologous genes to both control for and leverage evolutionary divergence during the training process. The two approaches were explored and validated within the context of mRNA expression level prediction and have the area under the ROC curve (auROC) values ranging from 0.75 to 0.94. Model weight inspections showed biologically interpretable patterns, resulting in the hypothesis that the 3′ UTR is more important for fine-tuning mRNA abundance levels while the 5′ UTR is more important for large-scale changes.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene.

          The genomic regulatory network that controls gene expression ultimately determines form and function in each species. The operational nature of the regulatory programming specified in cis-regulatory DNA sequence was determined from a detailed functional analysis of a sea urchin control element that directs the expression of a gene in the endoderm during development. Spatial expression and repression, and the changing rate of transcription of this gene, are mediated by a complex and extended cis-regulatory system. The system may be typical of developmental cis-regulatory apparatus. All of its activities are integrated in the proximal element, which contains seven target sites for DNA binding proteins. A quantitative computational model of this regulatory element was constructed that explicitly reveals the logical interrelations hard-wired into the DNA.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Unraveling the KNOTTED1 regulatory network in maize meristems.

            KNOTTED1 (KN1)-like homeobox (KNOX) transcription factors function in plant meristems, self-renewing structures consisting of stem cells and their immediate daughters. We defined the KN1 cistrome in maize inflorescences and found that KN1 binds to several thousand loci, including 643 genes that are modulated in one or multiple tissues. These KN1 direct targets are strongly enriched for transcription factors (including other homeobox genes) and genes participating in hormonal pathways, most significantly auxin, demonstrating that KN1 plays a key role in orchestrating the upper levels of a hierarchical gene regulatory network that impacts plant meristem identity and function.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              UTR-Dependent Control of Gene Expression in Plants.

              Throughout their lives, plants sense many developmental and environmental stimuli, and activation of optimal responses against these stimuli requires extensive transcriptional reprogramming. To facilitate this activation, plant mRNA contains untranslated regions (UTRs) that significantly increase the coding capacity of the genome by producing multiple mRNA variants from the same gene. In this review we compare UTRs of arabidopsis (Arabidopsis thaliana) and rice (Oryza sativum) at the genome scale to highlight their complexity in crop plants. We discuss different modes of UTR-based regulation with emphasis on genes that regulate multiple plant processes, including flowering, stress responses, and nutrient homeostasis. We demonstrate functional specificity in genes with variable UTR length and propose future research directions.
                Bookmark

                Author and article information

                Journal
                Proceedings of the National Academy of Sciences
                Proc Natl Acad Sci USA
                Proceedings of the National Academy of Sciences
                0027-8424
                1091-6490
                March 19 2019
                March 19 2019
                March 19 2019
                March 06 2019
                : 116
                : 12
                : 5542-5549
                Article
                10.1073/pnas.1814551116
                6431157
                30842277
                46661c67-6938-4292-997d-e49296fb5fad
                © 2019

                Free to read

                http://www.pnas.org/site/misc/userlicense.xhtml

                History

                Comments

                Comment on this article