112
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Petuum: A New Platform for Distributed Machine Learning on Big Data

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of ML programs at scale. We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. This presents unique opportunities for an integrative system design, such as bounded-error network synchronization and dynamic scheduling based on ML program structure. We demonstrate the efficacy of these system designs versus well-known implementations of modern ML algorithms, allowing ML programs to run in much less time and at considerably larger model sizes, even on modestly-sized compute clusters.

          Related collections

          Author and article information

          Journal
          2013-12-30
          2015-05-14
          Article
          1312.7651
          f4284e66-6948-477c-9bbf-2efc34f6fbcf

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          15 pages, 10 figures, final version in KDD 2015 under the same title
          stat.ML cs.LG cs.SY

          Performance, Systems & Control,Machine learning,Artificial intelligence
          Performance, Systems & Control, Machine learning, Artificial intelligence

          Comments

          Comment on this article