12
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments

      Preprint
      ,

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The application of natural language processing on political texts as well as speeches has become increasingly relevant in political sciences due to the ability to analyze large text corpora which cannot be read by a single person. But such text corpora often lack critical meta information, detailing for instance the party, age or constituency of the speaker, that can be used to provide an analysis tailored to more fine-grained research questions. To enable researchers to answer such questions with quantitative approaches such as natural language processing, we provide the SpeakGer data set, consisting of German parliament debates from all 16 federal states of Germany as well as the German Bundestag from 1947-2023, split into a total of 10,806,105 speeches. This data set includes rich meta data in form of information on both reactions from the audience towards the speech as well as information about the speaker's party, their age, their constituency and their party's political alignment, which enables a deeper analysis. We further provide three exploratory analyses, detailing topic shares of different parties throughout time, a descriptive analysis of the development of the age of an average speaker as well as a sentiment analysis of speeches of different parties with regards to the COVID-19 pandemic.

          Related collections

          Author and article information

          Journal
          23 October 2024
          Article
          2410.17886
          110a88f1-dc88-43ca-b6ac-2b1423f2df59

          http://creativecommons.org/licenses/by/4.0/

          History
          Custom metadata
          3rd Workshop on Computational Linguistics for Political Text Analysis (CPSS@KONVENS 2024), 19-28
          10 pages, 3 figures
          cs.CL

          Theoretical computer science
          Theoretical computer science

          Comments

          Comment on this article

          Related Documents Log