2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Are All Languages Equally Hard to Language-Model?

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          For general modeling methods applied to diverse languages, a natural question is: how well should we expect our models to work on languages with differing typological profiles? In this work, we develop an evaluation framework for fair cross-linguistic comparison of language models, using translated text so that all models are asked to predict approximately the same information. We then conduct a study on 21 languages, demonstrating that in some languages, the textual expression of the information is harder to predict with both \(n\)-gram and LSTM language models. We show complex inflectional morphology to be a cause of performance differences among languages.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: not found
          • Book Chapter: not found

          Corpus Linguistics and Translation Studies — Implications and Applications

          Mona Baker (1993)
            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Character-level language modeling with hierarchical recurrent neural networks

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              A Rich Morphological Tagger for English: Exploring the Cross-Linguistic Tradeoff Between Morphology and Syntax

                Bookmark

                Author and article information

                Journal
                10 June 2018
                Article
                1806.03743
                908dab71-7762-4b46-9f3d-1d9fddd2fbb9

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Published at NAACL 2018
                cs.CL

                Theoretical computer science
                Theoretical computer science

                Comments

                Comment on this article