3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Different molecular enumeration influences in deep learning: an example using aqueous solubility

      ,
      Briefings in Bioinformatics
      Oxford University Press (OUP)

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Aqueous solubility is the key property driving many chemical and biological phenomena and impacts experimental and computational attempts to assess those phenomena. Accurate prediction of solubility is essential and challenging, even with modern computational algorithms. Fingerprint-based, feature-based and molecular graph-based representations have all been used with different deep learning methods for aqueous solubility prediction. It has been clearly demonstrated that different molecular representations impact the model prediction and explainability. In this work, we reviewed different representations and also focused on using graph and line notations for modeling. In general, one canonical chemical structure is used to represent one molecule when computing its properties. We carefully examined the commonly used simplified molecular-input line-entry specification (SMILES) notation representing a single molecule and proposed to use the full enumerations in SMILES to achieve better accuracy. A convolutional neural network (CNN) was used. The full enumeration of SMILES can improve the presentation of a molecule and describe the molecule with all possible angles. This CNN model can be very robust when dealing with large datasets since no additional explicit chemistry knowledge is necessary to predict the solubility. Also, traditionally it is hard to use a neural network to explain the contribution of chemical substructures to a single property. We demonstrated the use of attention in the decoding network to detect the part of a molecule that is relevant to solubility, which can be used to explain the contribution from the CNN.

          Related collections

          Most cited references57

          • Record: found
          • Abstract: found
          • Article: not found

          Extended-connectivity fingerprints.

          Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            MoleculeNet: a benchmark for molecular machine learning † †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a

            A large scale benchmark for molecular machine learning consisting of multiple public datasets, metrics, featurizations and learning algorithms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Quantum-chemical insights from deep tensor neural networks

              Learning from data has led to paradigm shifts in a multitude of disciplines, including web, text and image search, speech recognition, as well as bioinformatics. Can machine learning enable similar breakthroughs in understanding quantum many-body systems? Here we develop an efficient deep learning approach that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems. We unify concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate (1 kcal mol−1) predictions in compositional and configurational chemical space for molecules of intermediate size. As an example of chemical relevance, the model reveals a classification of aromatic rings with respect to their stability. Further applications of our model for predicting atomic energies and local chemical potentials in molecules, reliable isomer energies, and molecules with peculiar electronic structure demonstrate the potential of machine learning for revealing insights into complex quantum-chemical systems.
                Bookmark

                Author and article information

                Contributors
                (View ORCID Profile)
                Journal
                Briefings in Bioinformatics
                Oxford University Press (OUP)
                1467-5463
                1477-4054
                June 05 2020
                June 05 2020
                Article
                10.1093/bib/bbaa092
                32501508
                cee01d3e-4c56-4d95-bc53-13aabce6870a
                © 2020

                https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

                History

                Comments

                Comment on this article