4
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      To submit to the journal, click here

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Greasing the wheels for comparative communication research: Supervised text classification for multilingual corpora

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Employing supervised machine learning for text classification is already a resource-intensive endeavor in a monolingual setting. However, facing the challenge to classify a multilingual corpus, the cost of producing the required annotated documents quickly exceeds even generous time and financial constraints. We show how tools like automated annotation and machine translation can not only efficiently but also effectively be employed for the classification of a multilingual corpus with supervised machine learning. Our findings demonstrate that good results can already be achieved with the machine translation of about 250 to 350 documents per category class and language and a dictionary in just one language, which we perceive as a realistic scenario for many projects. The methodological strategy is applied to study migration frames in seven languages (news discourse in seven European countries) and discussed and evaluated for its usability in comparative communication research.

          Related collections

          Most cited references56

          • Record: found
          • Abstract: not found
          • Article: not found

          Random Forests

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

            Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods—they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Affective News: The Automated Coding of Sentiment in Political Texts

                Bookmark

                Author and article information

                Contributors
                Journal
                CCR
                Computational Communication Research
                Amsterdam University Press (Amsterdam, the Netherlands )
                2665-9085
                2665-9085
                October 2021
                : 3
                : 3
                Affiliations
                Department of Communication, University of Vienna
                Department of Communication, University of Vienna
                Department of Computer Science, University of Vienna
                Department of Communication, University of Vienna
                Author information
                https://orcid.org/0000-0002-4978-9415
                https://orcid.org/0000-0001-9070-0550
                https://orcid.org/0000-0002-5260-1284
                Article
                CCR2021.3.001.LIND
                10.5117/CCR2021.3.001.LIND
                d9dd03d7-bc13-419a-9157-0b0d6cf47246
                © Fabienne Lind, Tobias Heidenreich, Christoph Kralj, & Hajo G. Boomgaarden

                [This is an open access article distributed under the terms of the CC BY-NC 4.0 license] http://creativecommons.org/licenses/by/4.0/

                History
                Categories
                Article

                multilingual content analysis,text classification,machine translation,comparative communication research,supervised machine learning

                Comments

                Comment on this article