109
views
0
recommends
+1 Recommend
1 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Learning about Spanish dialects through Twitter

      Preprint
      ,

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper maps the large-scale variation of the Spanish language by employing a corpus based on geographically tagged Twitter messages. Lexical dialects are extracted from an analysis of variants of tens of concepts. The resulting maps show linguistic variation on an unprecedented scale across the globe. We discuss the properties of the main dialects within a machine learning approach and find that varieties spoken in urban areas have an international character in contrast to country areas where dialects show a more regional uniformity.

          Related collections

          Most cited references1

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Crowdsourcing Dialect Characterization through Twitter

          We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.
            Bookmark

            Author and article information

            Journal
            2015-11-16
            2017-02-05
            Article
            1511.04970
            3b3bc3e3-4a93-4129-9781-d70a4c483541

            http://arxiv.org/licenses/nonexclusive-distrib/1.0/

            History
            Custom metadata
            RILI, XVI 2 (28), 65-75 (2016)
            16 pages, 5 figures, 1 table
            stat.ML cs.CL cs.CY physics.soc-ph stat.AP

            General physics,Theoretical computer science,Applications,Applied computer science,Machine learning

            Comments

            Comment on this article