ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

17

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

Playing hard exploration games by watching YouTube

Preprint

Author(s): Yusuf Aytar , Tobias Pfaff , David Budden , Tom Le Paine , Ziyu Wang , Nando de Freitas

Publication date Created: 29 May 2018

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

Related collections

Most cited references 10

Record: found
Abstract: not found
Article: not found

A survey of robot learning from demonstration

Brenna Argall, Sonia Chernova, Manuela Veloso … (2009)

0 comments Cited 421 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Conference Proceedings: not found

Apprenticeship learning via inverse reinforcement learning

Pieter Abbeel, Andrew Y. Ng (2004)

0 comments Cited 350 times – based on 0 reviews

Record: found
Abstract: not found
Conference Proceedings: not found

Unsupervised Visual Representation Learning by Context Prediction

Carl Doersch, Abhinav Gupta, Alexei Efros (2015)

0 comments Cited 170 times – based on 0 reviews

Author and article information

Journal

Publication date Created: 29 May 2018

Article

ArXiV ID: 1805.11592

SO-VID: 69d72be9-75b6-45c3-96fc-dba29ed093f3

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.AI cs.CV cs.LG

ScienceOpen disciplines: Computer vision & Pattern recognition,Artificial intelligence

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition, Artificial intelligence

Comments

Comment on this article