9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES†

      research-article
      , , , ,
      Chemical Science
      The Royal Society of Chemistry

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Inverse design allows the generation of molecules with desirable physical quantities using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. First, we achieve non-trivial performance on typical benchmarks for generative models without any training. Additionally, we demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. Overall, we anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wider adoption.

          Abstract

          Interpolation and exploration within the chemical space for inverse design.

          Related collections

          Most cited references52

          • Record: found
          • Abstract: not found
          • Article: not found

          SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

            We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete molecular representations. The predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations of molecules allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the domain of drug-like molecules and also in a set of molecules with fewer that nine heavy atoms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The ChEMBL database in 2017

              ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.
                Bookmark

                Author and article information

                Journal
                Chem Sci
                Chem Sci
                SC
                CSHCBM
                Chemical Science
                The Royal Society of Chemistry
                2041-6520
                2041-6539
                20 April 2021
                26 May 2021
                20 April 2021
                : 12
                : 20
                : 7079-7090
                Affiliations
                [a] Department of Computer Science, University of Toronto Canada alan@ 123456aspuru.com
                [b] Department of Chemistry, University of Toronto Canada
                [c] Vector Institute for Artificial Intelligence Toronto Canada
                [d] Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR) 661 University Ave Toronto Ontario M5G Canada
                Author information
                https://orcid.org/0000-0001-8836-6266
                https://orcid.org/0000-0002-8235-5969
                https://orcid.org/0000-0002-8277-4434
                Article
                d1sc00231g
                10.1039/d1sc00231g
                8153210
                34123336
                9149613d-29b4-43a2-b268-6762a0f38d2d
                This journal is © The Royal Society of Chemistry
                History
                : 12 January 2021
                : 12 April 2021
                Page count
                Pages: 12
                Funding
                Funded by: Compute Canada, doi 10.13039/100013020;
                Award ID: Unassigned
                Funded by: Natural Sciences and Engineering Research Council of Canada, doi 10.13039/501100000038;
                Award ID: Unassigned
                Funded by: Natural Resources Canada, doi 10.13039/501100000159;
                Award ID: Unassigned
                Funded by: Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung, doi 10.13039/501100001711;
                Award ID: 191127
                Funded by: Austrian Science Fund, doi 10.13039/501100002428;
                Award ID: J4309
                Categories
                Chemistry
                Custom metadata
                Paginated Article

                Comments

                Comment on this article