HyperTune: Dynamic Hyperparameter Tuning For Efficient Distribution of
  DNN Training Over Heterogeneous Systems

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Distributed training is a novel approach to accelerate Deep Neural Networks (DNN) training, but common training libraries fall short of addressing the distributed cases with heterogeneous processors or the cases where the processing nodes get interrupted by other workloads. This paper describes distributed training of DNN on computational storage devices (CSD), which are NAND flash-based, high capacity data storage with internal processing engines. A CSD-based distributed architecture incorporates the advantages of federated learning in terms of performance scalability, resiliency, and data privacy by eliminating the unnecessary data movement between the storage device and the host processor. The paper also describes Stannis, a DNN training framework that improves on the shortcomings of existing distributed training frameworks by dynamically tuning the training hyperparameters in heterogeneous systems to maintain the maximum overall processing speed in term of processed images per second and energy efficiency. Experimental results on image classification training benchmarks show up to 3.1x improvement in performance and 2.45x reduction in energy consumption when using Stannis plus CSD compare to the generic systems.

Related collections

Author and article information

Journal

Publication date Created: 15 July 2020

Article

ArXiV ID: 2007.08077

SO-VID: 0a4e8b44-84c8-4095-b91a-ee4ab5062ed4

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.DC cs.LG

ScienceOpen disciplines: Artificial intelligence,Networking & Internet architecture

Data availability:

ScienceOpen disciplines: Artificial intelligence, Networking & Internet architecture

HyperTune: Dynamic Hyperparameter Tuning For Efficient Distribution of DNN Training Over Heterogeneous Systems

Read this article at

Abstract

Related collections

Smart Contracts Programming Languages

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 64