12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Much work in Natural Language Processing (NLP) has been for resource-rich languages, making generalization to new, less-resourced languages challenging. We present two approaches for improving generalization to low-resourced languages by adapting continuous word representations using linguistically motivated subword units: phonemes, morphemes and graphemes. Our method requires neither parallel corpora nor bilingual dictionaries and provides a significant gain in performance over previous methods relying on these resources. We demonstrate the effectiveness of our approaches on Named Entity Recognition for four languages, namely Uyghur, Turkish, Bengali and Hindi, of which Uyghur and Bengali are low resource languages, and also perform experiments on Machine Translation. Exploiting subwords with transfer learning gives us a boost of +15.2 NER F1 for Uyghur and +9.7 F1 for Bengali. We also show improvements in the monolingual setting where we achieve (avg.) +3 F1 and (avg.) +1.35 BLEU.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Sentiment Embeddings with Applications to Sentiment Analysis

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Refining Word Embeddings Using Intensity Scores for Sentiment Analysis

                Bookmark

                Author and article information

                Journal
                28 August 2018
                Article
                1808.09500
                ad599018-40f9-4755-affd-9d6e2524633f

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Accepted at EMNLP 2018
                cs.CL

                Theoretical computer science
                Theoretical computer science

                Comments

                Comment on this article