Mapping and classifying molecules from a high-throughput structural database

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to navigating the database, representing its structure at a glance, understanding structure–property relations, eliminating duplicates and identifying inconsistencies. Here we present a case study, based on a data set of conformers of amino acids and dipeptides, of how machine-learning techniques can help addressing these issues. We will exploit a recently-developed strategy to define a metric between structures, and use it as the basis of both clustering and dimensionality reduction techniques—showing how these can help reveal structure–property relations, identify outliers and inconsistent structures, and rationalise how perturbations (e.g. binding of ions to the molecule) affect the stability of different conformers.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-017-0192-4) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 52

Record: found
Abstract: found
Article: not found

Survey of clustering algorithms.

Rui Xu, Donald Wunsch (2005)

Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

0 comments Cited 515 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

Katja Hansen, Franziska Biegler, Raghunathan Ramakrishnan … (2015)

Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.

0 comments Cited 254 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

LOF

Markus M. Breunig, Hans-Peter Kriegel, Raymond T Ng … (2000)

0 comments Cited 252 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Sandip De:

ORCID: http://orcid.org/0000-0001-8434-3497

sandip.de@epfl.ch

Felix Musil: felix.musil@epfl.ch

Teresa Ingram: ingram@fhi-berlin.mpg.de

Carsten Baldauf: baldauf@fhi-berlin.mpg.de

Michele Ceriotti: michele.ceriotti@epfl.ch

Journal

Journal ID (nlm-ta): J Cheminform

Journal ID (iso-abbrev): J Cheminform

Title: Journal of Cheminformatics

Publisher: Springer International Publishing (Cham )

ISSN (Electronic): 1758-2946

Publication date (Electronic): 2 February 2017

Publication date PMC-release: 2 February 2017

Publication date Collection: 2017

Volume: 9

Electronic Location Identifier: 6

Affiliations

[1 ]National Center for Computational Design and Discovery of Novel Materials (MARVEL), Lausanne, Switzerland

[2 ]ISNI 0000000121839049, GRID grid.5333.6, Laboratory of Computational Science and Modelling, Institute of Materials, , Ecole Polytechnique Fédérale de Lausanne, ; Lausanne, Switzerland

[3 ]Theory Department of the Fritz Haber Institute, Faradayweg 4-6, 14195 Berlin-Dahlem, Germany

Author information

Sandip De http://orcid.org/0000-0001-8434-3497

Article

Publisher ID: 192

DOI: 10.1186/s13321-017-0192-4

PMC ID: 5289135

SO-VID: 50ecbf0e-9bb7-4d70-9e53-80fa663cfce0

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 29 September 2016

Date accepted : 17 January 2017

Funding

Funded by: snsf nccr marvel

Funded by: MPG-EPFL center for molecularnanoscience

Custom metadata

ScienceOpen disciplines: Chemoinformatics

Data availability:

ScienceOpen disciplines: Chemoinformatics

Comments

Comment on this article

scite_

Cited by 11

See all cited by

Most referenced authors 1,483

See all reference authors

- Version 1

Mapping and classifying molecules from a high-throughput structural database

Read this article at

Abstract

Electronic supplementary material

Related collections

ChemSpider related publications

Most cited references 52

Survey of clustering algorithms.

Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

LOF

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 136

Cited by 11

Most referenced authors 1,483