16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Solving the RNA design problem with reinforcement learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance.

          Author summary

          Designing RNA sequences that fold to desired structures is an important problem in bioengineering. We have applied recent advances in machine learning to address this problem. The computer learns without any human input, using only trial and error to figure out how to design RNA. It quickly discovers powerful strategies that let it solve many difficult design problems. When tested on a challenging benchmark, it outperforms all previous algorithms. We analyze its solutions and identify some of the strategies it has learned, as well as other important strategies it has failed to learn. This suggests possible approaches to further improving its performance. This work reflects a paradigm shift taking place in computer science, which has the potential to transform computational biology. Instead of relying on experts to design algorithms by hand, computers can use artificial intelligence to learn their own algorithms directly. The resulting methods often work better than the ones designed by humans.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: not found
          • Article: not found

          Fast folding and comparison of RNA secondary structures

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Higher-order cellular information processing with synthetic RNA devices.

            The engineering of biological systems is anticipated to provide effective solutions to challenges that include energy and food production, environmental quality, and health and medicine. Our ability to transmit information to and from living systems, and to process and act on information inside cells, is critical to advancing the scale and complexity at which we can engineer, manipulate, and probe biological systems. We developed a general approach for assembling RNA devices that can execute higher-order cellular information processing operations from standard components. The engineered devices can function as logic gates (AND, NOR, NAND, or OR gates) and signal filters, and exhibit cooperativity. RNA devices process and transmit molecular inputs to targeted protein outputs, linking computation to gene expression and thus the potential to control cellular function.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Deep Residual Learning for Image Recognition

              Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Software
                Role: Funding acquisitionRole: ResourcesRole: Supervision
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                21 June 2018
                June 2018
                : 14
                : 6
                : e1006176
                Affiliations
                [1 ] Department of Bioengineering, Stanford University, Stanford, CA, United States of America
                [2 ] Department of Chemistry, Stanford University, Stanford, CA, United States of America
                [3 ] Department of Computer Science, Stanford University, Stanford, CA, United States of America
                University of Missouri, UNITED STATES
                Author notes

                I have read the journal's policy and the authors of this manuscript have the following competing interests: VSP is an SAB member of Schrodinger, LLC and a General Partner at Andreessen Horowtiz.

                Author information
                http://orcid.org/0000-0002-9566-9684
                http://orcid.org/0000-0002-9373-8981
                Article
                PCOMPBIOL-D-17-02171
                10.1371/journal.pcbi.1006176
                6029810
                29927936
                cd8c9ed0-21d0-4566-a531-49665b1e834b
                © 2018 Eastman et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 30 December 2017
                : 4 May 2018
                Page count
                Figures: 3, Tables: 1, Pages: 15
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01 GM062868
                Award Recipient :
                Funded by: National Institutes of Health (US)
                Award ID: U19 AI109662
                Award Recipient :
                This work was supported by US National Institutes of Health ( https://www.nih.gov/) grants RO1 GM062868 and U19 AI109662. BR was supported by the Fannie and John Hertz Foundation. We acknowledge the generous support of Dr. Anders G. Frøseth for our work on machine learning. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Biology and life sciences
                Molecular biology
                Macromolecular structure analysis
                RNA structure
                Biology and life sciences
                Biochemistry
                Nucleic acids
                RNA
                RNA structure
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Machine Learning Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Machine Learning Algorithms
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Machine Learning Algorithms
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Motif Analysis
                Biology and life sciences
                Molecular biology
                Macromolecular structure analysis
                RNA structure
                RNA folding
                Biology and life sciences
                Biochemistry
                Nucleic acids
                RNA
                RNA structure
                RNA folding
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Physical Sciences
                Physics
                Thermodynamics
                Free Energy
                Research and Analysis Methods
                Database and Informatics Methods
                Database Searching
                Sequence Similarity Searching
                Custom metadata
                vor-update-to-uncorrected-proof
                2018-07-03
                All relevant data are within the paper and its Supporting Information files.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article