Blog
About

8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Mapping and classifying molecules from a high-throughput structural database

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to navigating the database, representing its structure at a glance, understanding structure–property relations, eliminating duplicates and identifying inconsistencies. Here we present a case study, based on a data set of conformers of amino acids and dipeptides, of how machine-learning techniques can help addressing these issues. We will exploit a recently-developed strategy to define a metric between structures, and use it as the basis of both clustering and dimensionality reduction techniques—showing how these can help reveal structure–property relations, identify outliers and inconsistent structures, and rationalise how perturbations (e.g. binding of ions to the molecule) affect the stability of different conformers.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13321-017-0192-4) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references 52

          • Record: found
          • Abstract: found
          • Article: not found

          Survey of clustering algorithms.

           Rui Xu,  Donald Wunsch (2005)
          Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning

            We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schr\"odinger equation is mapped onto a non-linear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density-functional theory. Cross-validation over more than seven thousand small organic molecules yields a mean absolute error of ~10 kcal/mol. Applicability is demonstrated for the prediction of molecular atomization potential energy curves.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Accurate molecular van der Waals interactions from ground-state electron density and free-atom reference data.

              We present a parameter-free method for an accurate determination of long-range van der Waals interactions from mean-field electronic structure calculations. Our method relies on the summation of interatomic C6 coefficients, derived from the electron density of a molecule or solid and accurate reference data for the free atoms. The mean absolute error in the C6 coefficients is 5.5% when compared to accurate experimental values for 1225 intermolecular pairs, irrespective of the employed exchange-correlation functional. We show that the effective atomic C6 coefficients depend strongly on the bonding environment of an atom in a molecule. Finally, we analyze the van der Waals radii and the damping function in the C6R(-6) correction method for density-functional theory calculations.
                Bookmark

                Author and article information

                Contributors
                sandip.de@epfl.ch
                felix.musil@epfl.ch
                ingram@fhi-berlin.mpg.de
                baldauf@fhi-berlin.mpg.de
                michele.ceriotti@epfl.ch
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                2 February 2017
                2 February 2017
                2017
                : 9
                Affiliations
                [1 ]National Center for Computational Design and Discovery of Novel Materials (MARVEL), Lausanne, Switzerland
                [2 ]ISNI 0000000121839049, GRID grid.5333.6, Laboratory of Computational Science and Modelling, Institute of Materials, , Ecole Polytechnique Fédérale de Lausanne, ; Lausanne, Switzerland
                [3 ]Theory Department of the Fritz Haber Institute, Faradayweg 4-6, 14195 Berlin-Dahlem, Germany
                Article
                192
                10.1186/s13321-017-0192-4
                5289135
                © The Author(s) 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                Funding
                Funded by: snsf nccr marvel
                Funded by: MPG-EPFL center for molecularnanoscience
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2017

                Chemoinformatics

                Comments

                Comment on this article