26
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      To submit to the journal, click here

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine Translation as an Underrated Ingredient? Solving Classification Tasks with Large Language Models for Comparative Research

      research-article

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          While large language models have revolutionised computational text analysis methods, the field is still tilted towards English language resources. Even as there are pre-trained models for some smaller languages, the coverage is far from universal, and pre-training large language models is an expensive and complicated task. This uneven language coverage limits comparative social research in terms of its geographical and linguistic scope. We propose a solution that sidesteps these issues by leveraging transfer learning and open-source machine translation. We use English as a bridge language between Hungarian and Polish bills and laws to solve a classification task related to the Comparative Agendas Project (CAP) coding scheme. Using the Hungarian corpus as training data for model fine-tuning, we categorise the Polish laws into 20 CAP categories. In doing so, we compare the performance of Transformer-based deep learning models (monolinguals, such as BERT, and multilinguals such as XLM-RoBERTa) and machine learning algorithms (e.g., SVM). Results show that the fine-tuned large language models outperform the traditional supervised learning benchmarks but are themselves surpassed by the machine translation approach. Overall, the proposed solution demonstrates a viable option for applying a transfer learning framework for low-resource languages and achieving state-of-the-art results without requiring expensive pre-training.

          Related collections

          Author and article information

          Contributors
          Journal
          CCR
          Computational Communication Research
          Amsterdam University Press (Amsterdam )
          2665-9085
          2665-9085
          2023
          : 5
          : 2
          : 1
          Affiliations
          Center for Social Science, Budapest
          Article
          10.5117/CCR2023.2.6.MATE
          10.5117/CCR2023.2.6.MATE
          6b359045-6bb1-46e6-9ed6-79011c6bd966
          © The author(s)

          This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

          History
          Categories
          Article

          machine learning,Comparative Agendas Project,policy topics,classification,natural language processing,deep learning

          Comments

          Comment on this article