5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      A survey of algorithms for transforming molecular dynamics data into metadata for in situ analytics based on machine learning methods

      review-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper presents the survey of three algorithms to transform atomic-level molecular snapshots from molecular dynamics (MD) simulations into metadata representations that are suitable for in situ analytics based on machine learning methods. MD simulations studying the classical time evolution of a molecular system at atomic resolution are widely recognized in the fields of chemistry, material sciences, molecular biology and drug design; these simulations are one of the most common simulations on supercomputers. Next-generation supercomputers will have a dramatically higher performance than current systems, generating more data that needs to be analysed (e.g. in terms of number and length of MD trajectories). In the future, the coordination of data generation and analysis can no longer rely on manual, centralized analysis traditionally performed after the simulation is completed or on current data representations that have been defined for traditional visualization tools. Powerful data preparation phases (i.e. phases in which original row data is transformed to concise and still meaningful representations) will need to proceed data analysis phases. Here, we discuss three algorithms for transforming traditionally used molecular representations into concise and meaningful metadata representations. The transformations can be performed locally. The new metadata can be fed into machine learning methods for runtime in situ analysis of larger MD trajectories supported by high-performance computing. In this paper, we provide an overview of the three algorithms and their use for three different applications: protein–ligand docking in drug design; protein folding simulations; and protein engineering based on analytics of protein functions depending on proteins' three-dimensional structures.

          This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

          Related collections

          Author and article information

          Journal
          Philos Trans A Math Phys Eng Sci
          Philos Trans A Math Phys Eng Sci
          RSTA
          roypta
          Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
          The Royal Society Publishing
          1364-503X
          1471-2962
          6 March 2020
          20 January 2020
          : 378
          : 2166 , Discussion meeting issue ‘Numerical algorithms for high-performance computational science’ organised and edited by Jack Dongarra, Laura Grigori and Nicholas J. Higham
          : 20190063
          Affiliations
          [1 ] Electrical Engineering and Computer Science Department, The University of Tennessee Knoxville , 401 Min H. Kao Bldg., 1520 Middle Drive, Knoxville, TN 37996-2250, USA
          [2 ] Computer Science Department, University of New Mexico , MSC01 1130, Albuquerque, NM 87131-1070, USA
          [3 ] Oak Ridge National Laboratory , PO Box 2008, Oak Ridge, TN 37831, USA
          Author notes

          One contribution of 15 to a discussion meeting issue ‘ Numerical algorithms for high-performance computational science’.

          Author information
          http://orcid.org/0000-0002-0031-6377
          http://orcid.org/0000-0001-7743-8754
          http://orcid.org/0000-0001-7108-1934
          Article
          PMC7015296 PMC7015296 7015296 rsta20190063
          10.1098/rsta.2019.0063
          7015296
          31955686
          0d1cf9a5-902c-4284-9d84-127c26c55b91
          © 2020 The Author(s)

          Published by the Royal Society. All rights reserved.

          History
          : 4 December 2019
          Funding
          Funded by: NSF SCI: Collaborative Research: DAPLDS - a Dynamically Adaptive Protein-Ligand Docking System based on Multi-Scale Modeling;
          Award ID: 0802650
          Funded by: NSF BIGDATA: IA: Collaborative Research: In Situ Data Analytics for Next Generation Molecular Dynamics Workflows;
          Award ID: IIS 1741057
          Funded by: NSF SHF: Small: Collaborative Research: Modeling and Analyzing Big Data on Peta- and Exascale Distributed Systems supported by MapReduce Methodologies;
          Award ID: 1318445
          Categories
          1003
          45
          168
          Articles
          Review Article
          Custom metadata
          March 6, 2020

          protein engineering,MapReduce,machine learning,protein folding,protein–ligand docking

          Comments

          Comment on this article