49
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active toward a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.

          Abstract

          Using artificial neural networks, computers can learn to generate molecules with desired target properties. This can aid in the creative process of drug design.

          Related collections

          Most cited references57

          • Record: found
          • Abstract: found
          • Article: not found

          LSTM: A Search Space Odyssey

          Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful functional ANalysis Of VAriance framework. In total, we summarize the results of 5400 experimental runs ( ≈ 15 years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The properties of known drugs. 1. Molecular frameworks.

            In order to better understand the common features present in drug molecules, we use shape description methods to analyze a database of commercially available drugs and prepare a list of common drug shapes. A useful way of organizing this structural data is to group the atoms of each drug molecule into ring, linker, framework, and side chain atoms. On the basis of the two-dimensional molecular structures (without regard to atom type, hybridization, and bond order), there are 1179 different frameworks among the 5120 compounds analyzed. However, the shapes of half of the drugs in the database are described by the 32 most frequently occurring frameworks. This suggests that the diversity of shapes in the set of known drugs is extremely low. In our second method of analysis, in which atom type, hybridization, and bond order are considered, more diversity is seen; there are 2506 different frameworks among the 5120 compounds in the database, and the most frequently occurring 42 frameworks account for only one-fourth of the drugs. We discuss the possible interpretations of these findings and the way they may be used to guide future drug discovery research.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics

              The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo-and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.
                Bookmark

                Author and article information

                Journal
                ACS Cent Sci
                ACS Cent Sci
                oc
                acscii
                ACS Central Science
                American Chemical Society
                2374-7943
                2374-7951
                28 December 2017
                24 January 2018
                : 4
                : 1
                : 120-131
                Affiliations
                []Institute of Organic Chemistry & Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster , 48149 Münster, Germany
                []Hit Discovery, Discovery Sciences, AstraZeneca R&D , Gothenburg, Sweden
                [§ ]Department of Medicinal Chemistry, IMED RIA, AstraZeneca R&D , Gothenburg, Sweden
                []Department of Physics & International Centre for Quantum and Molecular Structures, Shanghai University , Shanghai, China
                Author notes
                Article
                10.1021/acscentsci.7b00512
                5785775
                29392184
                b98b979b-e547-421c-b687-63d55e682fb7
                Copyright © 2017 American Chemical Society

                This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes.

                History
                : 24 October 2017
                Categories
                Research Article
                Custom metadata
                oc7b00512
                oc-2017-00512b

                Comments

                Comment on this article