ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

8

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

Preprint

Author(s): Leshem Choshen , Lior Fox , Yonatan Loewenstein

Publication date Created: 11 April 2018

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Exploration is a fundamental aspect of Reinforcement Learning, typically implemented using stochastic action-selection. Exploration, however, can be more efficient if directed toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality. While there are a few model-based solutions to this shortcoming, a model-free approach is still missing. We propose \(E\)-values, a generalization of counters that can be used to evaluate the propagating exploratory value over state-action trajectories. We compare our approach to commonly used RL techniques, and show that using \(E\)-values improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to efficiently learn continuous MDPs. We demonstrate this by showing that our approach surpasses state of the art performance in the Freeway Atari 2600 game.

Related collections

Most cited references 5

Record: found
Abstract: not found
Article: not found

An analysis of model-based Interval Estimation for Markov Decision Processes

Alexander L. Strehl, Michael L Littman (2008)

0 comments Cited 62 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Conference Proceedings: not found

PAC model-free reinforcement learning

Eric Wiewiora, Michael L Littman, John Langford … (2006)

0 comments Cited 53 times – based on 0 reviews

Record: found
Abstract: not found
Conference Proceedings: not found

Near-Bayesian exploration in polynomial time

J. Zico Kolter, Andrew Y. Ng (2009)

0 comments Cited 28 times – based on 0 reviews

Author and article information

Journal

Publication date Created: 11 April 2018

Article

ArXiV ID: 1804.04012

SO-VID: c3852e9b-a3a0-4a4d-ac2d-0845bc3e59b2

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments Final version for ICLR 2018

Categories cs.LG cs.AI stat.ML

ScienceOpen disciplines: Machine learning,Artificial intelligence

Data availability:

ScienceOpen disciplines: Machine learning, Artificial intelligence

Comments

Comment on this article