RVL-BERT: Visual Relationship Detection with Visual-Linguistic Knowledge
  from Pre-trained Representations

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years. Inspired by human reasoning mechanism, it is believed that external visual commonsense knowledge is beneficial for reasoning visual relationships of objects in images, which is however rarely considered in existing methods. In this paper, we propose a novel approach named Relational Visual-Linguistic Bidirectional Encoder Representations from Transformers (RVL-BERT), which performs relational reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training with multimodal representations. RVL-BERT also uses an effective spatial module and a novel mask attention module to explicitly capture spatial information among the objects. Moreover, our model decouples object detection from visual relationship recognition by taking in object names directly, enabling it to be used on top of any object detection system. We show through quantitative and qualitative experiments that, with the transferred knowledge and novel modules, RVL-BERT surpasses previous state-of-the-art on two challenging visual relationship detection datasets. The source code will be publicly available soon.

Related collections

Author and article information

Journal

Publication date Created: 10 September 2020

Article

ArXiV ID: 2009.04965

SO-VID: fc0b7590-673b-4915-9723-f5a62b46f83c

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments 9 pages, 4 figures, 4 tables

Categories cs.CV cs.CL cs.LG

ScienceOpen disciplines: Computer vision & Pattern recognition,Theoretical computer science,Artificial intelligence

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition, Theoretical computer science, Artificial intelligence

RVL-BERT: Visual Relationship Detection with Visual-Linguistic Knowledge from Pre-trained Representations

Read this article at

Abstract

Related collections

Semantic Knowledge Base

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 155