4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GEN: highly efficient SMILES explorer using autodidactic generative examination networks

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95–98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85–90%) while generating SMILES with strong conservation of the property space (95–99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Molecular de-novo design through deep reinforcement learning

          This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model. Graphical abstract . Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0235-x) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            SMILES. 2. Algorithm for generation of unique SMILES notation

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              InChI, the IUPAC International Chemical Identifier

              This paper documents the design, layout and algorithms of the IUPAC International Chemical Identifier, InChI.
                Bookmark

                Author and article information

                Contributors
                ruud.van.deursen@firmenich.com
                guillaume.godin@firmenich.com
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                10 April 2020
                10 April 2020
                2020
                : 12
                : 22
                Affiliations
                [1 ]Firmenich SA, Research and Development, Rue des Jeunes 1, Les Acacias, 1227 Geneva, Switzerland
                [2 ]GRID grid.419481.1, ISNI 0000 0001 1515 9979, Novartis Institutes for BioMedical Research, Novartis Campus, ; 4056 Basel, Switzerland
                [3 ]Institute of Structural Biology, Helmholtz Zentrum München-German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
                [4 ]BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany
                Article
                425
                10.1186/s13321-020-00425-8
                7146994
                3bbe6a0a-0b30-443a-8324-cd350cafa2fb
                © The Author(s) 2020

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 22 September 2019
                : 23 March 2020
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100010665, H2020 Marie Skłodowska-Curie Actions;
                Award ID: 676434
                Award Recipient :
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2020

                Chemoinformatics
                autonomous learning,gen,gan,rnn,lstm,gru,bilstm,bigru,ai,smiles,generator,quality control,sqc
                Chemoinformatics
                autonomous learning, gen, gan, rnn, lstm, gru, bilstm, bigru, ai, smiles, generator, quality control, sqc

                Comments

                Comment on this article