5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      No-Regret Exploration in Goal-Oriented Reinforcement Learning

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the so-called episodic setting or stochastic shortest path (SSP) problem, where an agent has to achieve a predefined goal state (e.g., the top of the hill) while maximizing the cumulative reward or minimizing the cumulative cost. Despite its popularity, most of the literature studying the exploration-exploitation dilemma either focused on different problems (i.e., fixed-horizon and infinite-horizon) or made the restrictive loop-free assumption (which implies that no same state can be visited twice during any episode). In this paper, we study the general SSP setting and introduce the algorithm UC-SSP whose regret scales as \(\displaystyle \widetilde{O}(c_{\max}^{3/2} c_{\min}^{-1/2} D S \sqrt{ A D K})\) after \(K\) episodes for any unknown SSP with \(S\) non-terminal states, \(A\) actions, an SSP-diameter of \(D\) and positive costs in \([c_{\min}, c_{\max}]\). UC-SSP is thus the first learning algorithm with vanishing regret in the theoretically challenging setting of episodic RL.

          Related collections

          Author and article information

          Journal
          07 December 2019
          Article
          1912.03517
          7a2a1eb0-1348-4702-aa9a-8d432ee2da52

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          stat.ML cs.LG

          Machine learning,Artificial intelligence
          Machine learning, Artificial intelligence

          Comments

          Comment on this article