31
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Mapping and classifying molecules from a high-throughput structural database

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to navigating the database, representing its structure at a glance, understanding structure–property relations, eliminating duplicates and identifying inconsistencies. Here we present a case study, based on a data set of conformers of amino acids and dipeptides, of how machine-learning techniques can help addressing these issues. We will exploit a recently-developed strategy to define a metric between structures, and use it as the basis of both clustering and dimensionality reduction techniques—showing how these can help reveal structure–property relations, identify outliers and inconsistent structures, and rationalise how perturbations (e.g. binding of ions to the molecule) affect the stability of different conformers.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13321-017-0192-4) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references52

          • Record: found
          • Abstract: found
          • Article: not found

          Survey of clustering algorithms.

          Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

            Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              LOF

                Bookmark

                Author and article information

                Contributors
                sandip.de@epfl.ch
                felix.musil@epfl.ch
                ingram@fhi-berlin.mpg.de
                baldauf@fhi-berlin.mpg.de
                michele.ceriotti@epfl.ch
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                2 February 2017
                2 February 2017
                2017
                : 9
                : 6
                Affiliations
                [1 ]National Center for Computational Design and Discovery of Novel Materials (MARVEL), Lausanne, Switzerland
                [2 ]ISNI 0000000121839049, GRID grid.5333.6, Laboratory of Computational Science and Modelling, Institute of Materials, , Ecole Polytechnique Fédérale de Lausanne, ; Lausanne, Switzerland
                [3 ]Theory Department of the Fritz Haber Institute, Faradayweg 4-6, 14195 Berlin-Dahlem, Germany
                Author information
                http://orcid.org/0000-0001-8434-3497
                Article
                192
                10.1186/s13321-017-0192-4
                5289135
                50ecbf0e-9bb7-4d70-9e53-80fa663cfce0
                © The Author(s) 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 29 September 2016
                : 17 January 2017
                Funding
                Funded by: snsf nccr marvel
                Funded by: MPG-EPFL center for molecularnanoscience
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2017

                Chemoinformatics
                Chemoinformatics

                Comments

                Comment on this article