9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fisher Information and Natural Gradient Learning of Random Deep Networks

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A deep neural network is a hierarchical nonlinear model transforming input signals to output signals. Its input-output relation is considered to be stochastic, being described for a given input by a parameterized conditional probability distribution of outputs. The space of parameters consisting of weights and biases is a Riemannian manifold, where the metric is defined by the Fisher information matrix. The natural gradient method uses the steepest descent direction in a Riemannian manifold, so it is effective in learning, avoiding plateaus. It requires inversion of the Fisher information matrix, however, which is practically impossible when the matrix has a huge number of dimensions. Many methods for approximating the natural gradient have therefore been introduced. The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections under the mean field approximation. We prove that the Fisher information matrix is unit-wise block diagonal supplemented by small order terms of off-block-diagonal elements, which provides a justification for the quasi-diagonal natural gradient method by Y. Ollivier. A unitwise block-diagonal Fisher metrix reduces to the tensor product of the Fisher information matrices of single units. We further prove that the Fisher information matrix of a single unit has a simple reduced form, a sum of a diagonal matrix and a rank 2 matrix of weight-bias correlations. We obtain the inverse of Fisher information explicitly. We then have an explicit form of the natural gradient, without relying on the numerical matrix inversion, which drastically speeds up stochastic gradient learning.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: not found
          • Article: not found

          Natural Gradient Works Efficiently in Learning

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Acceleration of Stochastic Approximation by Averaging

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              A Theory of Adaptive Pattern Classifiers

                Bookmark

                Author and article information

                Journal
                21 August 2018
                Article
                1808.07172
                06007762-5bf9-477b-ab90-b7e1f1bd210b

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                22 pages, 2 figures
                cs.LG cond-mat.dis-nn stat.ML

                Theoretical physics,Machine learning,Artificial intelligence
                Theoretical physics, Machine learning, Artificial intelligence

                Comments

                Comment on this article