71
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We investigated the effect of different training scenarios on predicting the (retro)synthesis of chemical compounds using text-like representation of chemical reactions (SMILES) and Natural Language Processing (NLP) neural network Transformer architecture. We showed that data augmentation, which is a powerful method used in image processing, eliminated the effect of data memorization by neural networks and improved their performance for prediction of new sequences. This effect was observed when augmentation was used simultaneously for input and the target data simultaneously. The top-5 accuracy was 84.8% for the prediction of the largest fragment (thus identifying principal transformation for classical retro-synthesis) for the USPTO-50k test dataset, and was achieved by a combination of SMILES augmentation and a beam search algorithm. The same approach provided significantly better results for the prediction of direct reactions from the single-step USPTO-MIT test set. Our model achieved 90.6% top-1 and 96.1% top-5 accuracy for its challenging mixed set and 97% top-5 accuracy for the USPTO-MIT separated set. It also significantly improved results for USPTO-full set single-step retrosynthesis for both top-1 and top-10 accuracies. The appearance frequency of the most abundantly generated SMILES was well correlated with the prediction outcome and can be used as a measure of the quality of reaction prediction.

          Abstract

          Development of algorithms to predict reactant and reagents given a target molecule is key to accelerate retrosynthesis approaches. Here the authors demonstrate that applying augmentation techniques to the SMILE representation of target data significantly improves the quality of the reaction predictions.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A survey on Image Data Augmentation for Deep Learning

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Planning chemical syntheses with deep neural networks and symbolic AI

              To plan the syntheses of small organic molecules, chemists use retrosynthesis, a problem-solving technique in which target molecules are recursively transformed into increasingly simpler precursors. Computer-aided retrosynthesis would be a valuable tool but at present it is slow and provides results of unsatisfactory quality. Here we use Monte Carlo tree search and symbolic artificial intelligence (AI) to discover retrosynthetic routes. We combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on essentially all reactions ever published in organic chemistry. Our system solves for almost twice as many molecules, thirty times faster than the traditional computer-aided search method, which is based on extracted rules and hand-designed heuristics. In a double-blind AB test, chemists on average considered our computer-generated routes to be equivalent to reported literature routes.
                Bookmark

                Author and article information

                Contributors
                i.tetko@helmholtz-muenchen.de
                guillaume.godin@firmenich.com
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                4 November 2020
                4 November 2020
                2020
                : 11
                : 5575
                Affiliations
                [1 ]Institute of Structural Biology, Helmholtz Zentrum München—Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany
                [2 ]BIGCHEM GmbH, Valerystr. 49, D-85716 Unterschleißheim, Germany
                [3 ]Firmenich International SA, D-Lab by Firmenich, Rue de la Bergère 7, CH-1242 Meyrin-Satigny, Switzerland
                Author information
                http://orcid.org/0000-0002-6855-0012
                http://orcid.org/0000-0002-8124-7325
                http://orcid.org/0000-0001-9828-386X
                Article
                19266
                10.1038/s41467-020-19266-y
                7643129
                33149154
                34a1f90b-0359-4520-92a5-4ea21c1301aa
                © The Author(s) 2020

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 19 March 2020
                : 5 October 2020
                Categories
                Article
                Custom metadata
                © The Author(s) 2020

                Uncategorized
                cheminformatics,reaction mechanisms,computational chemistry
                Uncategorized
                cheminformatics, reaction mechanisms, computational chemistry

                Comments

                Comment on this article