16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Partial monitoring games are repeated games where the learner receives feedback that might be different from adversary's move or even the reward gained by the learner. Recently, a general model of combinatorial partial monitoring (CPM) games was proposed \cite{lincombinatorial2014}, where the learner's action space can be exponentially large and adversary samples its moves from a bounded, continuous space, according to a fixed distribution. The paper gave a confidence bound based algorithm (GCB) that achieves \(O(T^{2/3}\log T)\) distribution independent and \(O(\log T)\) distribution dependent regret bounds. The implementation of their algorithm depends on two separate offline oracles and the distribution dependent regret additionally requires existence of a unique optimal action for the learner. Adopting their CPM model, our first contribution is a Phased Exploration with Greedy Exploitation (PEGE) algorithmic framework for the problem. Different algorithms within the framework achieve \(O(T^{2/3}\sqrt{\log T})\) distribution independent and \(O(\log^2 T)\) distribution dependent regret respectively. Crucially, our framework needs only the simpler "argmax" oracle from GCB and the distribution dependent regret does not require existence of a unique optimal action. Our second contribution is another algorithm, PEGE2, which combines gap estimation with a PEGE algorithm, to achieve an \(O(\log T)\) regret bound, matching the GCB guarantee but removing the dependence on size of the learner's action space. However, like GCB, PEGE2 requires access to both offline oracles and the existence of a unique optimal action. Finally, we discuss how our algorithm can be efficiently applied to a CPM problem of practical interest: namely, online ranking with feedback at the top.

          Related collections

          Most cited references6

          • Record: found
          • Abstract: not found
          • Article: not found

          Linearly Parameterized Bandits

            Bookmark
            • Record: found
            • Abstract: not found
            • Book Chapter: not found

            Some Aspects of the Sequential Design of Experiments

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Regret Minimization Under Partial Monitoring

                Bookmark

                Author and article information

                Journal
                2016-08-23
                Article
                1608.06403
                95e02035-d909-4c88-9196-84e630a6019f

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Appearing in NIPS 2016
                cs.GT cs.AI

                Theoretical computer science,Artificial intelligence
                Theoretical computer science, Artificial intelligence

                Comments

                Comment on this article