28
views
0
recommends
+1 Recommend
1 collections
    1
    shares

      To submit to the journal, click here

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      URLs Can Facilitate Machine Learning Classification of News Stories Across Languages and Contexts

      research-article

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Comparative scholars studying political news content at scale face the challenge of addressing multiple languages. While many train individual supervised machine learning classifiers for each language, this is a costly and time-consuming process. We propose that instead of relying on thematic labels generated by manual coding, researchers can use ‘distant’ labels created by cues in article URLs. Sections reflected in URLs (e.g., nytimes.com/politics/) can therefore help create training material for supervised machine learning classifiers. Using cues provided by news media organizations, such an approach allows for efficient political news identification at scale while facilitating implementation across languages. Using a dataset of approximately 870,000 URLs of news-related content from four countries (Italy, Germany, Netherlands, and Poland), we test this method by providing a comparison to ‘classical’ supervised machine learning and a multilingual BERT model, across four news topics. Our results suggest that the use of URL section cues to distantly annotate texts provides a cheap and easy-to- implement way of classifying large volumes of news texts that can save researchers many valuable resources without having to sacrifice quality.

          Related collections

          Author and article information

          Contributors
          Journal
          CCR
          Computational Communication Research
          Amsterdam University Press (Amsterdam )
          2665-9085
          2665-9085
          2023
          : 5
          : 2
          : 1
          Affiliations
          PhD Student
          Amsterdam School of Communication Research (ASCoR), University of Amsterdam, the Netherlands
          Amsterdam School of Communication Research (ASCoR), University of Amsterdam, the Netherlands
          Article
          10.5117/CCR2023.2.4.DELE
          10.5117/CCR2023.2.4.DELE
          b82af762-7ba4-4a6d-84f8-c4984471d13f
          © The author(s)

          This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

          History
          Categories
          Article

          distant classification,text classification,political news,machine learning,multilingual data

          Comments

          Comment on this article