13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Contrastive Learning of Emoji-based Representations for Resource-Poor Languages

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The introduction of emojis (or emoticons) in social media platforms has given the users an increased potential for expression. We propose a novel method called Classification of Emojis using Siamese Network Architecture (CESNA) to learn emoji-based representations of resource-poor languages by jointly training them with resource-rich languages using a siamese network. CESNA model consists of twin Bi-directional Long Short-Term Memory Recurrent Neural Networks (Bi-LSTM RNN) with shared parameters joined by a contrastive loss function based on a similarity metric. The model learns the representations of resource-poor and resource-rich language in a common emoji space by using a similarity metric based on the emojis present in sentences from both languages. The model, hence, projects sentences with similar emojis closer to each other and the sentences with different emojis farther from one another. Experiments on large-scale Twitter datasets of resource-rich languages - English and Spanish and resource-poor languages - Hindi and Telugu reveal that CESNA outperforms the state-of-the-art emoji prediction approaches based on distributional semantics, semantic rules, lexicon lists and deep neural network representations without shared parameters.

          Related collections

          Most cited references3

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Understanding and summarizing answers in community-based question answering services

            Bookmark
            • Record: found
            • Abstract: not found
            • Book Chapter: not found

            A Sentiment Analysis System for Indian Language Tweets

              Bookmark
              • Record: found
              • Abstract: not found
              • Book Chapter: not found

              Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis

                Bookmark

                Author and article information

                Journal
                02 April 2018
                Article
                1804.01855
                022abfb5-06cf-45c4-a61b-e39bb1188b8b

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Accepted Long Paper at 19th International Conference on Computational Linguistics and Intelligent Text Processing, March 2018, Hanoi, Vietnam. arXiv admin note: substantial text overlap with arXiv:1804.00805
                cs.CL

                Comments

                Comment on this article