18
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      To submit to the journal, click here

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Evaluating Transferability in Multilingual Text Analyses

      research-article

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Multilingual text analysis is increasingly important to address the current narrow focus of English and other Indo- European languages in comparative studies. However, there has been a lack of a comprehensive approach to evaluate the validity of multilingual text analytic methods across different language contexts. To address this issue, we propose that the validity of multilingual text analysis should be studied through the lens of transferability, which assesses the extent to which the performance of a multilingual text analytic method can be maintained when switching from one language context to another. We first formally conceptualize transferability in multilingual text analysis as a measure of whether the method is equivalent across language groups (linguistic transferability) and societal contexts (contextual transferability). We propose a model-agnostic approach to evaluate transferability using (1) natural and synthetic data pairs, (2) manual annotation of errors, and (3) the Local Interpretable Model-Agnostic Explanations (LIME) technique. As an application of our approach, we analyze the transferability of a multilingual BERT (mBERT) model fine-tuned with annotated manifestos and media texts from five Indo-European language-speaking countries of the Comparative Agendas Project. The transferability is then evaluated using natural and synthetic parliamentary data from the UK, Basque, Hong Kong, and Taiwan. Through the evaluation of transferability, this study sheds light on the common causes that lead to prediction errors in multilingual text classification using mBERT.

          Related collections

          Author and article information

          Contributors
          Journal
          CCR
          Computational Communication Research
          Amsterdam University Press (Amsterdam )
          2665-9085
          2665-9085
          2023
          : 5
          : 2
          : 1
          Affiliations
          Academia Sinica
          GESIS - Leibniz-Institut für Sozialwissenschaften
          Article
          10.5117/CCR2023.2.2.HO
          10.5117/CCR2023.2.2.HO
          aea35f41-01d2-4953-8e87-11d04b02c6bc
          © The author(s)

          This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

          History
          Categories
          Article

          Transfer Learning,Multilingual Text Analysis,Machine Learning,Error Analysis,Topic Classification

          Comments

          Comment on this article