Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Leveraging explainability for understanding object descriptions in ambiguous 3D environments

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          For effective human-robot collaboration, it is crucial for robots to understand requests from users perceiving the three-dimensional space and ask reasonable follow-up questions when there are ambiguities. While comprehending the users’ object descriptions in the requests, existing studies have focused on this challenge for limited object categories that can be detected or localized with existing object detection and localization modules. Further, they have mostly focused on comprehending the object descriptions using flat RGB images without considering the depth dimension. On the other hand, in the wild, it is impossible to limit the object categories that can be encountered during the interaction, and 3-dimensional space perception that includes depth information is fundamental in successful task completion. To understand described objects and resolve ambiguities in the wild, for the first time, we suggest a method leveraging explainability. Our method focuses on the active areas of an RGB scene to find the described objects without putting the previous constraints on object categories and natural language instructions. We further improve our method to identify the described objects considering depth dimension. We evaluate our method in varied real-world images and observe that the regions suggested by our method can help resolve ambiguities. When we compare our method with a state-of-the-art baseline, we show that our method performs better in scenes with ambiguous objects which cannot be recognized by existing object detectors. We also show that using depth features significantly improves performance in scenes where depth data is critical to disambiguate the objects and across our evaluation dataset that contains objects that can be specified with and without the depth dimension.

          Related collections

          Most cited references67

          • Record: found
          • Abstract: not found
          • Book Chapter: not found

          Microsoft COCO: Common Objects in Context

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Mask R-CNN

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

                Bookmark

                Author and article information

                Contributors
                Journal
                Front Robot AI
                Front Robot AI
                Front. Robot. AI
                Frontiers in Robotics and AI
                Frontiers Media S.A.
                2296-9144
                04 January 2023
                2022
                : 9
                : 937772
                Affiliations
                Division of Robotics , Perception and Learning , School of Electrical Engineering and Computer Science , KTH Royal Institute of Technology , Stockholm, Sweden
                Author notes

                Edited by: Kosmas Dimitropoulos, Centre for Research and Technology Hellas (CERTH), Greece

                Reviewed by: Nikos Grammalidis, Centre for Research and Technology Hellas (CERTH), Greece

                Sotiris Manitsaris, Université de Sciences Lettres de Paris, France

                *Correspondence: Fethiye Irmak Doğan, fidogan@ 123456kth.se

                This article was submitted to Human-Robot Interaction, a section of the journal Frontiers in Robotics and AI

                Article
                937772
                10.3389/frobt.2022.937772
                9872646
                1f9390ba-5892-40a0-b05b-f8d15e28e986
                Copyright © 2023 Doğan, Melsión and Leite.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 06 May 2022
                : 29 November 2022
                Funding
                Funded by: Vetenskapsrådet , doi 10.13039/501100004359;
                Award ID: 2017–05189
                Funded by: Stiftelsen för Strategisk Forskning , doi 10.13039/501100011751;
                Award ID: SSF FFL18-0199
                Funded by: NordForsk , doi 10.13039/501100004785;
                Award ID: S-FACTOR project
                Funded by: Kungliga Tekniska Högskolan , doi 10.13039/501100004270;
                Award ID: Digital Futures Research Center Vinnova Competence Center for Trustworthy Edge Computing Systems and Applications
                Funded by: Knut och Alice Wallenbergs Stiftelse , doi 10.13039/501100004063;
                Award ID: Wallenberg Al, Autonomous Systems and Software Program (WASP)
                This work was partially funded by grants from the Swedish Research Council (2017-05189), the Swedish Foundation for Strategic Research (SSF FFL18-0199), the S-FACTOR project from NordForsk, the Digital Futures Research Center, the Vinnova Competence Center for Trustworthy Edge Computing Systems and Applications at KTH, and the Wallenberg Al, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.
                Categories
                Robotics and AI
                Original Research

                explainability,resolving ambiguities,depth,referring expression comprehension (rec),real-world environments

                Comments

                Comment on this article