10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Filtering and Mining Parallel Data in a Joint Multilingual Space

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We learn a joint multilingual sentence embedding and use the distance between sentences in different languages to filter noisy parallel data and to mine for parallel data in large news collections. We are able to improve a competitive baseline on the WMT'14 English to German task by 0.3 BLEU by filtering out 25% of the training data. The same approach is used to mine additional bitexts for the WMT'14 system and to obtain competitive results on the BUCC shared task to identify parallel sentences in comparable corpora. The approach is generic, it can be applied to many language pairs and it is independent of the architecture of the machine translation system.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: not found
          • Article: not found

          The Web as a Parallel Corpus

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Learning Joint Multilingual Sentence Representations with Neural Machine Translation

                Bookmark

                Author and article information

                Journal
                24 May 2018
                Article
                1805.09822
                d25a1dd4-e837-48d2-9c68-0655ba9f2989

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                ACL, July 2018, Melbourne
                8 pages
                cs.CL cs.AI

                Theoretical computer science,Artificial intelligence
                Theoretical computer science, Artificial intelligence

                Comments

                Comment on this article