12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Applications of Deep-Learning in Exploiting Large-Scale and Heterogeneous Compound Data in Industrial Pharmaceutical Research

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In recent years, the development of high-throughput screening (HTS) technologies and their establishment in an industrialized environment have given scientists the possibility to test millions of molecules and profile them against a multitude of biological targets in a short period of time, generating data in a much faster pace and with a higher quality than before. Besides the structure activity data from traditional bioassays, more complex assays such as transcriptomics profiling or imaging have also been established as routine profiling experiments thanks to the advancement of Next Generation Sequencing or automated microscopy technologies. In industrial pharmaceutical research, these technologies are typically established in conjunction with automated platforms in order to enable efficient handling of screening collections of thousands to millions of compounds. To exploit the ever-growing amount of data that are generated by these approaches, computational techniques are constantly evolving. In this regard, artificial intelligence technologies such as deep learning and machine learning methods play a key role in cheminformatics and bio-image analytics fields to address activity prediction, scaffold hopping, de novo molecule design, reaction/retrosynthesis predictions, or high content screening analysis. Herein we summarize the current state of analyzing large-scale compound data in industrial pharmaceutical research and describe the impact it has had on the drug discovery process over the last two decades, with a specific focus on deep-learning technologies.

          Related collections

          Most cited references114

          • Record: found
          • Abstract: found
          • Article: not found

          Planning chemical syntheses with deep neural networks and symbolic AI

          To plan the syntheses of small organic molecules, chemists use retrosynthesis, a problem-solving technique in which target molecules are recursively transformed into increasingly simpler precursors. Computer-aided retrosynthesis would be a valuable tool but at present it is slow and provides results of unsatisfactory quality. Here we use Monte Carlo tree search and symbolic artificial intelligence (AI) to discover retrosynthetic routes. We combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on essentially all reactions ever published in organic chemistry. Our system solves for almost twice as many molecules, thirty times faster than the traditional computer-aided search method, which is based on extracted rules and hand-designed heuristics. In a double-blind AB test, chemists on average considered our computer-generated routes to be equivalent to reported literature routes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology

            BindingDB, www.bindingdb.org, is a publicly accessible database of experimental protein-small molecule interaction data. Its collection of over a million data entries derives primarily from scientific articles and, increasingly, US patents. BindingDB provides many ways to browse and search for data of interest, including an advanced search tool, which can cross searches of multiple query types, including text, chemical structure, protein sequence and numerical affinities. The PDB and PubMed provide links to data in BindingDB, and vice versa; and BindingDB provides links to pathway information, the ZINC catalog of available compounds, and other resources. The BindingDB website offers specialized tools that take advantage of its large data collection, including ones to generate hypotheses for the protein targets bound by a bioactive compound, and for the compounds bound by a new protein of known sequence; and virtual compound screening by maximal chemical similarity, binary kernel discrimination, and support vector machine methods. Specialized data sets are also available, such as binding data for hundreds of congeneric series of ligands, drawn from BindingDB and organized for use in validating drug design methods. BindingDB offers several forms of programmatic access, and comes with extensive background material and documentation. Here, we provide the first update of BindingDB since 2007, focusing on new and unique features and highlighting directions of importance to the field as a whole.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Analyzing Learned Molecular Representations for Property Prediction

              Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Pharmacol
                Front Pharmacol
                Front. Pharmacol.
                Frontiers in Pharmacology
                Frontiers Media S.A.
                1663-9812
                05 November 2019
                2019
                : 10
                : 1303
                Affiliations
                [1] 1Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca , Gothenburg, Sweden
                [2] 2Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität Bonn , Bonn, Germany
                [3] 3Department of Chemistry and Biochemistry, University of Bern , Bern, Switzerland
                [4] 4Quantitative Biology, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca , Gothenburg, Sweden
                [5] 5Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG , Biberach an der Riss, Germany
                [6] 6Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health – Guangdong Laboratory , Guangzhou, China
                Author notes

                Edited by: Jianfeng Pei, Peking University, China

                Reviewed by: Alexander Sedykh, Sciome LLC, United States; Maxim Kuznetsov, Insilico Medicine, Inc., United States

                *Correspondence: Laurianne David, Laurianne.david1@ 123456gmail.com ; Hongming Chen, Hongming.Chen71@ 123456hotmail.com

                This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology

                Article
                10.3389/fphar.2019.01303
                6848277
                31749705
                64c81bd3-53f6-40ae-af4f-c79fdc623acc
                Copyright © 2019 David, Arús-Pous, Karlsson, Engkvist, Bjerrum, Kogej, Kriegl, Beck and Chen

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 07 August 2019
                : 14 October 2019
                Page count
                Figures: 6, Tables: 2, Equations: 0, References: 186, Pages: 16, Words: 8150
                Funding
                Funded by: H2020 Marie Skłodowska-Curie Actions 10.13039/100010665
                Categories
                Pharmacology
                Review

                Pharmacology & Pharmaceutical medicine
                artificial intelligence,deep learning,chemogenomics,large-scale data,pharmaceutical industry

                Comments

                Comment on this article