5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Learning Multilingual Topics from Incomparable Corpus

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Multilingual topic models enable crosslingual tasks by extracting consistent topics from multilingual corpora. Most models require parallel or comparable training corpora, which limits their ability to generalize. In this paper, we first demystify the knowledge transfer mechanism behind multilingual topic models by defining an alternative but equivalent formulation. Based on this analysis, we then relax the assumption of training data required by most existing models, creating a model that only requires a dictionary for training. Experiments show that our new method effectively learns coherent multilingual topics from partially and fully incomparable corpora with limited amounts of dictionary resources.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Polylingual topic models

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

                Bookmark

                Author and article information

                Journal
                11 June 2018
                Article
                1806.04270
                104642b1-e548-4b37-8f6f-b6d55a1fc6e7

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                Custom metadata
                To appear in International Conference on Computational Linguistics (COLING), 2018
                cs.CL

                Theoretical computer science
                Theoretical computer science

                Comments

                Comment on this article