16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Playing hard exploration games by watching YouTube

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: not found
          • Article: not found

          A survey of robot learning from demonstration

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Apprenticeship learning via inverse reinforcement learning

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Unsupervised Visual Representation Learning by Context Prediction

                Bookmark

                Author and article information

                Journal
                29 May 2018
                Article
                1805.11592
                69d72be9-75b6-45c3-96fc-dba29ed093f3

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.AI cs.CV cs.LG

                Computer vision & Pattern recognition,Artificial intelligence
                Computer vision & Pattern recognition, Artificial intelligence

                Comments

                Comment on this article