14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Conference Proceedings: not found

      Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

      proceedings-article
      1 , 2 , 2 , 1
      Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18} (IJCAI-2018)
      Artificial Intelligence
      August 13, 2018 - August 19, 2018

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          With the widespread use of intelligent systems, such as smart speakers, addressee recognition has become a concern in human-computer interaction, as more and more people expect such systems to understand complicated social scenes, including those outdoors, in cafeterias, and hospitals. Because previous studies typically focused only on pre-specified tasks with limited conversational situations such as controlling smart homes, we created a mock dataset called Addressee Recognition in Visual Scenes with Utterances (ARVSU) that contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario. We also propose a multi-modal deep-learning-based model that takes different human cues, specifically eye gazes and transcripts of an utterance corpus, into account to predict the conversational addressee from a specific speaker's view in various real-life conversational scenarios. To the best of our knowledge, we are the first to introduce an end-to-end deep learning model that combines vision and transcripts of utterance for addressee recognition. As a result, our study suggests that future addressee recognition can reach the ability to understand human intention in many social situations previously unexplored, and our modality dataset is a first step in promoting research in this field.

          Related collections

          Author and article information

          Conference
          July 2018
          July 2018
          : 1546-1553
          Affiliations
          [1 ]Tokyo Institute of Technology, Tokyo, Japan
          [2 ]Yahoo Japan Corporation
          Article
          10.24963/ijcai.2018/214
          b2785fe5-7e8f-43f9-b6b2-2afe1adbe7d1
          © 2018
          Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}
          IJCAI-2018
          27
          Stockholm, Sweden
          August 13, 2018 - August 19, 2018
          International Joint Conferences on Artificial Intelligence Organization (IJCAI)
          Artificial Intelligence
          History

          Developmental biology,Ecology
          Developmental biology, Ecology

          Comments

          Comment on this article