Unpaired Image Captioning by Language Pivoting

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this unpaired image captioning problem by language pivoting. Our method can effectively capture the characteristics of an image captioner from the pivot language (Chinese) and align it to the target language (English) using another pivot-target (Chinese-English) parallel corpus. We evaluate our method on two image-to-English benchmark datasets: MSCOCO and Flickr30K. Quantitative comparisons against several baseline approaches demonstrate the effectiveness of our method.

Related collections

Author and article information

Journal

Publication date Created: 14 March 2018

Article

ArXiV ID: 1803.05526

SO-VID: 7cfbfbe6-d749-40ac-add6-05b0a8d1d3c3

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.CV

Data availability:

Unpaired Image Captioning by Language Pivoting

Read this article at

Abstract

Related collections

Language change

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 270