25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      On scientific understanding with artificial intelligence

      review-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          An oracle that correctly predicts the outcome of every particle physics experiment, the products of every possible chemical reaction or the function of every protein would revolutionize science and technology. However, scientists would not be entirely satisfied because they would want to comprehend how the oracle made these predictions. This is scientific understanding, one of the main aims of science. With the increase in the available computational power and advances in artificial intelligence, a natural question arises: how can advanced computational systems, and specifically artificial intelligence, contribute to new scientific understanding or gain it autonomously? Trying to answer this question, we adopted a definition of ‘scientific understanding’ from the philosophy of science that enabled us to overview the scattered literature on the topic and, combined with dozens of anecdotes from scientists, map out three dimensions of computer-assisted scientific understanding. For each dimension, we review the existing state of the art and discuss future developments. We hope that this Perspective will inspire and focus research directions in this multidisciplinary emerging field.

          Abstract

          Scientific understanding is one of the main aims of science. This Perspective discusses how advanced computational systems, and artificial intelligence in particular, can contribute to driving scientific understanding.

          Related collections

          Most cited references106

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Highly accurate protein structure prediction with AlphaFold

          Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            From local explanations to global understanding with explainable AI for trees

            Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are popular non-linear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here, we improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model’s performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains. Exact game-theoretic explanations for ensemble tree-based predictions that guarantee desirable properties.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Highly accurate protein structure prediction for the human proteome

              Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure 1 . Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold 2 , at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective. AlphaFold is used to predict the structures of almost all of the proteins in the human proteome—the availability of high-confidence predicted structures could enable new avenues of investigation from a structural perspective.
                Bookmark

                Author and article information

                Contributors
                mario.krenn@mpl.mpg.de
                alan@aspuru.com
                Journal
                Nat Rev Phys
                Nat Rev Phys
                Nature Reviews. Physics
                Nature Publishing Group UK (London )
                2522-5820
                11 October 2022
                : 1-9
                Affiliations
                [1 ]GRID grid.419562.d, ISNI 0000 0004 0374 4283, Max Planck Institute for the Science of Light (MPL), ; Erlangen, Germany
                [2 ]GRID grid.17063.33, ISNI 0000 0001 2157 2938, Chemical Physics Theory Group, Department of Chemistry, , University of Toronto, ; Toronto, Ontario Canada
                [3 ]GRID grid.17063.33, ISNI 0000 0001 2157 2938, Department of Computer Science, , University of Toronto, ; Toronto, Ontario Canada
                [4 ]GRID grid.494618.6, Vector Institute for Artificial Intelligence, ; Toronto, Ontario Canada
                [5 ]GRID grid.7892.4, ISNI 0000 0001 0075 5874, Institute of Nanotechnology, , Karlsruhe Institute of Technology, ; Eggenstein-Leopoldshafen, Germany
                [6 ]GRID grid.38142.3c, ISNI 000000041936754X, Department of Chemistry and Chemical Biology, , Harvard University, ; Cambridge, MA USA
                [7 ]GRID grid.5386.8, ISNI 000000041936877X, Division of Infectious Diseases, Weill Department of Medicine, , Weill Cornell Medical College, ; New York, USA
                [8 ]GRID grid.16821.3c, ISNI 0000 0004 0368 8293, Center of Hydrogen Science, Shanghai Jiao Tong University, ; Shanghai, China
                [9 ]GRID grid.16821.3c, ISNI 0000 0004 0368 8293, State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, , Shanghai Jiao Tong University, ; Shanghai, China
                [10 ]GRID grid.16821.3c, ISNI 0000 0004 0368 8293, Innovation Center for Future Materials, Zhangjiang Institute for Advanced Study, , Shanghai Jiao Tong University, ; Shanghai, China
                [11 ]GRID grid.440050.5, ISNI 0000 0004 0408 2525, Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, ; Toronto, Ontario Canada
                Author information
                http://orcid.org/0000-0001-8286-8257
                http://orcid.org/0000-0002-8277-4434
                Article
                518
                10.1038/s42254-022-00518-3
                9552145
                36247217
                e5627eb4-5643-43c4-ae47-ba6a403aec1a
                © Springer Nature Limited 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

                This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.

                History
                : 30 August 2022
                Categories
                Perspective

                quantum physics,physical chemistry
                quantum physics, physical chemistry

                Comments

                Comment on this article