24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Shampoo: Preconditioned Stochastic Tensor Optimization

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Although it involves a more complex update rule, Shampoo's runtime per step is comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: not found
          • Article: not found

          Online Learning and Online Convex Optimization

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Introduction to Online Convex Optimization

            Elad Hazan (2015)
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Updating Quasi-Newton Matrices with Limited Storage

                Bookmark

                Author and article information

                Journal
                26 February 2018
                Article
                1802.09568
                f5c50d56-851d-4060-9ef3-1d317e97b46e

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.LG math.OC stat.ML

                Comments

                Comment on this article