Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Character-Aware Neural Language Models

Preprint

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.

      Related collections

      Most cited references 9

      • Record: found
      • Abstract: not found
      • Article: not found

      Long Short-Term Memory

        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        Learning long-term dependencies with gradient descent is difficult.

        Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered.
          Bookmark
          • Record: found
          • Abstract: not found
          • Article: not found

          Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

            Bookmark

            Author and article information

            Journal
            2015-08-26
            2015-12-01
            1508.06615

            http://arxiv.org/licenses/nonexclusive-distrib/1.0/

            Custom metadata
            AAAI 2016
            cs.CL cs.NE stat.ML

            Comments

            Comment on this article