6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Script identification facilitates many important applications in document/video analysis. This paper focuses on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in such scenario is difficult. Most of the recent approaches usually apply a patch-based CNN network with summation of obtained features, or only a CNN-LSTM network to get the identification result. Some use a discriminative CNN to jointly optimize mid-level representations and deep features. In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification. First we convert the images into patches and feed them into a CNN-LSTM framework. Attention-based patch weights are calculated applying softmax layer after LSTM. Then we do patch-wise multiplication of these weights with corresponding CNN to yield local features. Global features are also extracted from last cell state of LSTM. We employ a fusion technique which dynamically weights the local and global features for an individual patch. Experiments have been done in two public script identification datasets, SIW-13 and CVSI2015. Our learning procedure achieves superior performance compared with previous approaches.

          Related collections

          Most cited references24

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Image Captioning with Semantic Attention

          Automatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing. Existing approaches are either top-down, which start from a gist of an image and convert it into words, or bottom-up, which come up with words describing various aspects of an image and then combine them. In this paper, we propose a new algorithm that combines both approaches through a model of semantic attention. Our algorithm learns to selectively attend to semantic concept proposals and fuse them into hidden states and outputs of recurrent neural networks. The selection and fusion form a feedback connecting the top-down and bottom-up computation. We evaluate our algorithm on two public benchmarks: Microsoft COCO and Flickr30K. Experimental results show that our algorithm significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A robust arbitrary text detection system for natural scene images

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Strokelets: A Learned Multi-scale Representation for Scene Text Recognition

                Bookmark

                Author and article information

                Journal
                01 January 2018
                Article
                1801.00470
                096179bc-dda4-4a72-8d83-ad71d7dd06e2

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                The first and second authors contributed equally. Preprint submitted
                cs.CV

                Comments

                Comment on this article