5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

      research-article

      1 , * , 2 , 3

      Entropy

      MDPI

      stochastic gradient descent, deep learning, general relativity

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.

          Related collections

          Most cited references 10

          • Record: found
          • Abstract: found
          • Article: not found

          Deep learning.

          Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Gradient-based learning applied to document recognition

             Y Lecun,  L. Bottou,  Y Bengio (1998)
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Natural Gradient Works Efficiently in Learning

                Bookmark

                Author and article information

                Journal
                Entropy (Basel)
                Entropy (Basel)
                entropy
                Entropy
                MDPI
                1099-4300
                15 January 2020
                January 2020
                : 22
                : 1
                Affiliations
                [1 ]Dipartimento di Matematica, piazza Porta San Donato 5, University of Bologna, 40126 Bologna, Italy
                [2 ]Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, USA; pratikac@ 123456seas.upenn.edu
                [3 ]Computer Science Department, University of California, Los Angeles, CA 90095, USA; soatto@ 123456cs.ucla.edu
                Author notes
                [* ]Correspondence: rita.fioresi@ 123456unibo.it
                Article
                entropy-22-00101
                10.3390/e22010101
                7516401
                © 2020 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                Categories
                Article

                Comments

                Comment on this article