125
views
1
recommends
+1 Recommend
1 collections
    11
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Fall of the Empire: The Americanization of English

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          As global political preeminence gradually shifted from the United Kingdom to the United States, so did the capacity to culturally influence the rest of the world. In this work, we analyze how the world-wide varieties of written English are evolving. We study both the spatial and temporal variations of vocabulary and spelling of English using a large corpus of geolocated tweets and the Google Books datasets corresponding to books published in the US and the UK. The advantage of our approach is that we can address both standard written language (Google Books) and the more colloquial forms of microblogging messages (Twitter). We find that American English is the dominant form of English outside the UK and that its influence is felt even within the UK borders. Finally, we analyze how this trend has evolved over time and the impact that some cultural events have had in shaping it.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Twitter of Babel: Mapping World Languages through Microblogging Platforms

          Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data “proxies” of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Languages cool as they expand: Allometric scaling and the decreasing need for new words

            We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This “cooling pattern” forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Diffusion of Lexical Change in Social Media

              Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity – especially with regard to race – plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified “netspeak” dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.
                Bookmark

                Author and article information

                Journal
                2017-07-03
                Article
                1707.00781
                293e2b2c-4cd2-4ab3-b59b-2d5ff847de96

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                16 pages, 6 figures, 2 tables
                cs.CL cond-mat.stat-mech cs.CY physics.soc-ph stat.AP

                Condensed matter,General physics,Theoretical computer science,Applications,Applied computer science

                Comments

                Comment on this article