Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. We propose a fine-grained analysis of state-of-the-art methods based on key aspects of this framework: gradient estimation, value prediction, optimization landscapes, and trust region enforcement. We find that from this perspective, the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict. Our analysis suggests first steps towards solidifying the foundations of these algorithms, and in particular indicates that we may need to move beyond the current benchmark-centric evaluation methodology.

Related collections

Author and article information

Journal

Publication date Created: 06 November 2018

Article

ArXiV ID: 1811.02553

SO-VID: f3d2c0cf-b3f6-4685-adc9-6a31a6797b6d

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories stat.ML cs.LG cs.NE cs.RO

ScienceOpen disciplines: Robotics,Machine learning,Neural & Evolutionary computing,Artificial intelligence

Data availability:

ScienceOpen disciplines: Robotics, Machine learning, Neural & Evolutionary computing, Artificial intelligence

Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 135