11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Learning to predict where to look in interactive environments using deep recurrent q-learning

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Bottom-Up (BU) saliency models do not perform well in complex interactive environments where humans are actively engaged in tasks (e.g., sandwich making and playing the video games). In this paper, we leverage Reinforcement Learning (RL) to highlight task-relevant locations of input frames. We propose a soft attention mechanism combined with the Deep Q-Network (DQN) model to teach an RL agent how to play a game and where to look by focusing on the most pertinent parts of its visual input. Our evaluations on several Atari 2600 games show that the soft attention based model could predict fixation locations significantly better than bottom-up models such as Itti-Kochs saliency and Graph-Based Visual Saliency (GBVS) models.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Eye movements in natural behavior.

          The classic experiments of Yarbus over 50 years ago revealed that saccadic eye movements reflect cognitive processes. But it is only recently that three separate advances have greatly expanded our understanding of the intricate role of eye movements in cognitive function. The first is the demonstration of the pervasive role of the task in guiding where and when to fixate. The second has been the recognition of the role of internal reward in guiding eye and body movements, revealed especially in neurophysiological studies. The third important advance has been the theoretical developments in the fields of reinforcement learning and graphic simulation. All of these advances are proving crucial for understanding how behavioral programs control the selection of visual information.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Where we look when we steer.

            Steering a car requires visual information from the changing pattern of the road ahead. There are many theories about what features a driver might use, and recent attempts to engineer self-steering vehicles have sharpened interest in the mechanisms involved. However, there is little direct information linking steering performance to the driver's direction of gaze. We have made simultaneous recordings of steering-wheel angle and drivers' gaze direction during a series of drives along a tortuous road. We found that drivers rely particularly on the 'tangent point' on the inside of each curve, seeking this point 1-2 s before each bend and returning to it throughout the bend. The direction of this point relative to the car's heading predicts the curvature of the road ahead, and we examine the way this information is used.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              In what ways do eye movements contribute to everyday activities?

                Bookmark

                Author and article information

                Journal
                2016-12-17
                Article
                1612.05753
                75b126d6-2c97-43db-b8d3-5d304cceb364

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.CV cs.LG

                Computer vision & Pattern recognition,Artificial intelligence
                Computer vision & Pattern recognition, Artificial intelligence

                Comments

                Comment on this article