17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Hidden Shape of Stories Reveals Positivity Bias and Gender Bias

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          To capture the shape of stories is crucial for understanding the mind of human beings. In this research, we use word emdeddings methods, a widely used tool in natural language processing and machine learning, in order to quantify and compare emotional arcs of stories over time. Based on trained Google News word2vec vectors and film scripts corpora (N =1109), we form the fundamental building blocks of story emotional trajectories. The results demonstrate that there exists only one universal pattern of story shapes in movies. Furthermore, there exists a positivity and gender bias in story narratives. More interestingly, the audience reveals a completely different preference from content producers.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: found
          • Article: not found

          Word embeddings quantify 100 years of gender and ethnic stereotypes

          Word embeddings are a popular machine-learning method that represents each English word by a vector, such that the geometry between these vectors captures semantic relations between the corresponding words. We demonstrate that word embeddings can be used as a powerful tool to quantify historical trends and social change. As specific applications, we develop metrics based on word embeddings to characterize how gender stereotypes and attitudes toward ethnic minorities in the United States evolved during the 20th and 21st centuries starting from 1910. Our framework opens up a fruitful intersection between machine learning and quantitative social science. Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts—e.g., the women’s movement in the 1960s and Asian immigration into the United States—and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found
              Is Open Access

              Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter

              Individual happiness is a fundamental societal metric. Normally measured through self-report, happiness has often been indirectly characterized and overshadowed by more readily quantifiable economic indicators such as gross domestic product. Here, we examine expressions made on the online, global microblog and social networking service Twitter, uncovering and explaining temporal variations in happiness and information levels over timescales ranging from hours to years. Our data set comprises over 46 billion words contained in nearly 4.6 billion expressions posted over a 33 month span by over 63 million unique users. In measuring happiness, we use a real-time, remote-sensing, non-invasive, text-based approach---a kind of hedonometer. In building our metric, made available with this paper, we conducted a survey to obtain happiness evaluations of over 10,000 individual words, representing a tenfold size improvement over similar existing word sets. Rather than being ad hoc, our word list is chosen solely by frequency of usage and we show how a highly robust metric can be constructed and defended.
                Bookmark

                Author and article information

                Journal
                12 November 2018
                Article
                1811.04599
                ca008e40-8dfe-4a56-a2ef-2a4f2bb3bf44

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.CL

                Theoretical computer science
                Theoretical computer science

                Comments

                Comment on this article