18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.

          Related collections

          Author and article information

          Journal
          30 December 2019
          Article
          1912.13080
          c97a2288-758c-47b7-89e5-0ca692f2360c

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          ECIR 2020 (short)
          cs.IR cs.CL cs.LG

          Theoretical computer science,Information & Library science,Artificial intelligence

          Comments

          Comment on this article