16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Generating stylized captions for an image is an emerging topic in image captioning. Given an image as input, it requires the system to generate a caption that has a specific style (e.g., humorous, romantic, positive, and negative) while describing the image content semantically accurately. In this paper, we propose a novel stylized image captioning model that effectively takes both requirements into consideration. To this end, we first devise a new variant of LSTM, named style-factual LSTM, as the building block of our model. It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context. In addition, when we train the model to capture stylized elements, we propose an adaptive learning approach based on a reference factual model, it provides factual knowledge to the model as the model learns from stylized caption labels, and can adaptively compute how much information to supply at each time step. We evaluate our model on two stylized image captioning datasets, which contain humorous/romantic captions and positive/negative captions, respectively. Experiments shows that our proposed model outperforms the state-of-the-art approaches, without using extra ground truth supervision.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Show and tell: A neural image caption generator

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Deep visual-semantic alignments for generating image descriptions

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Image Style Transfer Using Convolutional Neural Networks

                Bookmark

                Author and article information

                Journal
                10 July 2018
                Article
                1807.03871
                60f79658-0514-4020-9cc9-38ed746ebcb5

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                14 pages, 7 figures, ECCV 2018
                cs.CV

                Computer vision & Pattern recognition
                Computer vision & Pattern recognition

                Comments

                Comment on this article