0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      NOMAD 2018 Kaggle Competition: Solving Materials Science Challenges Through Crowd Sourcing

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Machine learning (ML) is increasingly used in the field of materials science, where statistical estimates of computed properties are employed to rapidly examine the chemical space for new compounds. However, a systematic comparison of several ML models for this domain has been hindered by the scarcity of appropriate datasets of materials properties, as well as the lack of thorough benchmarking studies. To address this, a public data-analytics competition was organized by the Novel Materials Discovery (NOMAD) Centre of Excellence and hosted by the on-line platform Kaggle using a dataset of \(3\,000\) (Al\(_x\) Ga\(_y\) In\(_z\))\(_2\) O\(_3\) compounds (with \(x+y+z = 1\)). The aim of this challenge was to identify the best ML model for the prediction of two key physical properties that are relevant for optoelectronic applications: the electronic band gap energy and the crystalline formation energy. In this contribution, we present a summary of the top three ML approaches of the competition including the 1st place solution based on a crystal graph representation that is new for ML of the properties of materials. The 2nd place model combined many candidate descriptors from a set of compositional, atomic environment-based, and average structural properties with the light gradient-boosting machine regression model. The 3rd place model employed the smooth overlap of atomic positions representation with a neural network. To gain insight into whether the representation or the regression model determines the overall model performance, nine ML models obtained by combining the representations and regression models of the top three approaches were compared by looking at the correlations among prediction errors. At fixed representation, the largest correlation is observed in predictions made with kernel ridge regression and neural network, reflecting a similar performance on the same test set samples.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: not found

          Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

          Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Machine learning bandgaps of double perovskites

            The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Representation of compounds for machine-learning prediction of physical properties

                Bookmark

                Author and article information

                Journal
                30 November 2018
                Article
                1812.00085
                04363395-aa94-4245-8283-2d24ebde3960

                http://creativecommons.org/licenses/by/4.0/

                History
                Custom metadata
                cond-mat.mtrl-sci

                Condensed matter
                Condensed matter

                Comments

                Comment on this article