Learning to predict where to look in interactive environments using deep
  recurrent q-learning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Bottom-Up (BU) saliency models do not perform well in complex interactive environments where humans are actively engaged in tasks (e.g., sandwich making and playing the video games). In this paper, we leverage Reinforcement Learning (RL) to highlight task-relevant locations of input frames. We propose a soft attention mechanism combined with the Deep Q-Network (DQN) model to teach an RL agent how to play a game and where to look by focusing on the most pertinent parts of its visual input. Our evaluations on several Atari 2600 games show that the soft attention based model could predict fixation locations significantly better than bottom-up models such as Itti-Kochs saliency and Graph-Based Visual Saliency (GBVS) models.

Related collections

Most cited references 11

Record: found
Abstract: found
Article: not found

Eye movements in natural behavior.

Mary Hayhoe, Dana Ballard (2005)

The classic experiments of Yarbus over 50 years ago revealed that saccadic eye movements reflect cognitive processes. But it is only recently that three separate advances have greatly expanded our understanding of the intricate role of eye movements in cognitive function. The first is the demonstration of the pervasive role of the task in guiding where and when to fixate. The second has been the recognition of the role of internal reward in guiding eye and body movements, revealed especially in neurophysiological studies. The third important advance has been the theoretical developments in the fields of reinforcement learning and graphic simulation. All of these advances are proving crucial for understanding how behavioral programs control the selection of visual information.

0 comments Cited 256 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Where we look when we steer.

M F Land, D N Lee (1994)

Steering a car requires visual information from the changing pattern of the road ahead. There are many theories about what features a driver might use, and recent attempts to engineer self-steering vehicles have sharpened interest in the mechanisms involved. However, there is little direct information linking steering performance to the driver's direction of gaze. We have made simultaneous recordings of steering-wheel angle and drivers' gaze direction during a series of drives along a tortuous road. We found that drivers rely particularly on the 'tangent point' on the inside of each curve, seeking this point 1-2 s before each bend and returning to it throughout the bend. The direction of this point relative to the car's heading predicts the curvature of the road ahead, and we examine the way this information is used.