A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.

Related collections

Most cited references 10

Record: found
Abstract: found
Article: not found

Deep learning.

Yann LeCun, Yoshua Bengio, Geoffrey E Hinton (2015)

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

0 comments Cited 9217 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Gradient-based learning applied to document recognition

Y Lecun, L. Bottou, Y Bengio … (1998)

0 comments Cited 3530 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Natural Gradient Works Efficiently in Learning

Shun-ichi Amari (1998)

0 comments Cited 463 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Entropy (Basel)

Journal ID (iso-abbrev): Entropy (Basel)

Journal ID (publisher-id): entropy

Title: Entropy

Publisher: MDPI

ISSN (Electronic): 1099-4300

Publication date (Electronic): 15 January 2020

Publication date Collection: January 2020

Volume: 22

Issue: 1

Electronic Location Identifier: 101

Affiliations

[1 ]Dipartimento di Matematica, piazza Porta San Donato 5, University of Bologna, 40126 Bologna, Italy

[2 ]Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, USA; pratikac@ 123456seas.upenn.edu

[3 ]Computer Science Department, University of California, Los Angeles, CA 90095, USA; soatto@ 123456cs.ucla.edu

Author notes

[* ]Correspondence: rita.fioresi@ 123456unibo.it

Article

Publisher ID: entropy-22-00101

DOI: 10.3390/e22010101

PMC ID: 7516401

SO-VID: c6889b30-3aa6-4a44-ba20-3defe5a5e3cf

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

Read this article at

Abstract

Related collections

Computer Vision, Deep Learning, Deep Reinforcement Learning, IoT

Most cited references 10

Deep learning.

Gradient-based learning applied to document recognition

Natural Gradient Works Efficiently in Learning

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 342

Cited by 1

Most referenced authors 80