23
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

      , ,
      Entropy
      MDPI AG

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: not found

          Deep learning.

          Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Gradient-based learning applied to document recognition

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Natural Gradient Works Efficiently in Learning

                Bookmark

                Author and article information

                Journal
                ENTRFG
                Entropy
                Entropy
                MDPI AG
                1099-4300
                January 2020
                January 15 2020
                : 22
                : 1
                : 101
                Article
                10.3390/e22010101
                c6889b30-3aa6-4a44-ba20-3defe5a5e3cf
                © 2020

                https://creativecommons.org/licenses/by/4.0/

                History

                Comments

                Comment on this article