Preference-based reinforcement learning: a formal framework and a policy iteration algorithm

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Related collections

Most cited references 34

Record: found
Abstract: not found
Article: not found

Natural Gradient Works Efficiently in Learning

Shun-ichi Amari (1998)

0 comments Cited 455 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Learning to predict by the methods of temporal differences

Richard S. Sutton (1988)

0 comments Cited 425 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Reinforcement learning of motor skills with policy gradients.

Jan Peters, Stefan Schaal (2008)

Autonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

0 comments Cited 146 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Machine Learning

Abbreviated Title: Mach Learn

Publisher: Springer Nature

ISSN (Print): 0885-6125

ISSN (Electronic): 1573-0565

Publication date Created: October 2012

Publication date (Print): August 10 2012

Volume: 89

Issue: 1-2

Pages: 123-156

Article

DOI: 10.1007/s10994-012-5313-8

SO-VID: b8a5c6bd-c6c1-4e95-b82f-b4033dd7abe0

History

Data availability:

Preference-based reinforcement learning: a formal framework and a policy iteration algorithm

Read this article at

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 34

Natural Gradient Works Efficiently in Learning

Learning to predict by the methods of temporal differences

Reinforcement learning of motor skills with policy gradients.

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 2,138

Cited by 15

Most referenced authors 272