5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatic Standardization of Arabic Dialects for Machine Translation

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Based on an annotated multimedia corpus, television series Mar{\=a}y{\=a} 2013, we dig into the question of ''automatic standardization'' of Arabic dialects for machine translation. Here we distinguish between rule-based machine translation and statistical machine translation. Machine translation from Arabic most of the time takes standard or modern Arabic as the source language and produces quite satisfactory translations thanks to the availability of the translation memories necessary for training the models. The case is different for the translation of Arabic dialects. The productions are much less efficient. In our research we try to apply machine translation methods to a dialect/standard (or modern) Arabic pair to automatically produce a standard Arabic text from a dialect input, a process we call ''automatic standardization''. we opt here for the application of ''statistical models'' because ''automatic standardization'' based on rules is more hard with the lack of ''diglossic'' dictionaries on the one hand and the difficulty of creating linguistic rules for each dialect on the other. Carrying out this research could then lead to combining ''automatic standardization'' software and automatic translation software so that we take the output of the first software and introduce it as input into the second one to obtain at the end a quality machine translation. This approach may also have educational applications such as the development of applications to help understand different Arabic dialects by transforming dialectal texts into standard Arabic.

          Related collections

          Author and article information

          Journal
          09 January 2023
          Article
          2301.03447
          56d509e7-71e5-4300-b741-8bba0a6106f8

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          cs.CL
          ccsd

          Theoretical computer science
          Theoretical computer science

          Comments

          Comment on this article