Greasing the wheels for comparative communication research: Supervised text classification for multilingual corpora

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Employing supervised machine learning for text classification is already a resource-intensive endeavor in a monolingual setting. However, facing the challenge to classify a multilingual corpus, the cost of producing the required annotated documents quickly exceeds even generous time and financial constraints. We show how tools like automated annotation and machine translation can not only efficiently but also effectively be employed for the classification of a multilingual corpus with supervised machine learning. Our findings demonstrate that good results can already be achieved with the machine translation of about 250 to 350 documents per category class and language and a dictionary in just one language, which we perceive as a realistic scenario for many projects. The methodological strategy is applied to study migration frames in seven languages (news discourse in seven European countries) and discussed and evaluated for its usability in comparative communication research.

Related collections

Most cited references 56

Record: found
Abstract: not found
Article: not found

Random Forests

Leo Breiman (2001)

0 comments Cited 7171 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

Justin Grimmer, Brandon M. Stewart (2013)

Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods—they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.

0 comments Cited 290 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Affective News: The Automated Coding of Sentiment in Political Texts

Lori Young, Stuart Soroka (2012)

0 comments Cited 124 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Fabienne Lind:

ORCID: https://orcid.org/0000-0002-4978-9415

Bio :

Correspondence address: Fabienne Lind, Kolingasse 14-16, 1090 Vienna, Austria

Tobias Heidenreich:

ORCID: https://orcid.org/0000-0001-9070-0550

Christoph Kralj

Hajo G. Boomgaarden:

ORCID: https://orcid.org/0000-0002-5260-1284

Journal

Journal ID (publisher-id): CCR

Title: Computational Communication Research

Publisher: Amsterdam University Press (Amsterdam, the Netherlands )

ISSN (Print): 2665-9085

ISSN (Electronic): 2665-9085

Publication date (Electronic): October 2021

Volume: 3

Issue: 3

Affiliations

Department of Communication, University of Vienna

Department of Computer Science, University of Vienna

Department of Communication, University of Vienna

Author information

Fabienne Lind https://orcid.org/0000-0002-4978-9415

Tobias Heidenreich https://orcid.org/0000-0001-9070-0550

Hajo G. Boomgaarden https://orcid.org/0000-0002-5260-1284

Article

Publisher ID: CCR2021.3.001.LIND

DOI: 10.5117/CCR2021.3.001.LIND

SO-VID: d9dd03d7-bc13-419a-9157-0b0d6cf47246

License:

[This is an open access article distributed under the terms of the CC BY-NC 4.0 license] http://creativecommons.org/licenses/by/4.0/

To submit to the journal, click here

Greasing the wheels for comparative communication research: Supervised text classification for multilingual corpora

Read this article at

Abstract

Related collections

Computational Communication Research

Most cited references 56

Random Forests

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

Affective News: The Automated Coding of Sentiment in Political Texts

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 195

Cited by 1

Most referenced authors 858