10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Human and computer estimations of Predictability of words in written language

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In parallel, the Natural Language Processing (NLP) field evolved by developing a wide variety of applications. Here, we show that using different word embeddings techniques (like Latent Semantic Analysis, Word2Vec, and FastText) and N-gram-based language models we were able to estimate how humans predict words (cloze-task Predictability) and how to better understand eye movements in long Spanish texts. Both types of models partially captured aspects of predictability. On the one hand, our N-gram model performed well when added as a replacement for the cloze-task Predictability of the fixated word. On the other hand, word embeddings were useful to mimic Predictability of the following word. Our study joins efforts from neurolinguistic and NLP fields to understand human information processing during reading to potentially improve NLP algorithms.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: not found
          Is Open Access

          The effect of word predictability on reading time is logarithmic.

          It is well known that real-time human language processing is highly incremental and context-driven, and that the strength of a comprehender's expectation for each word encountered is a key determinant of the difficulty of integrating that word into the preceding context. In reading, this differential difficulty is largely manifested in the amount of time taken to read each word. While numerous studies over the past thirty years have shown expectation-based effects on reading times driven by lexical, syntactic, semantic, pragmatic, and other information sources, there has been little progress in establishing the quantitative relationship between expectation (or prediction) and reading times. Here, by combining a state-of-the-art computational language model, two large behavioral data-sets, and non-parametric statistical techniques, we establish for the first time the quantitative form of this relationship, finding that it is logarithmic over six orders of magnitude in estimated predictability. This result is problematic for a number of established models of eye movement control in reading, but lends partial support to an optimal perceptual discrimination account of word recognition. We also present a novel model in which language processing is highly incremental well below the level of the individual word, and show that it predicts both the shape and time-course of this effect. At a more general level, this result provides challenges for both anticipatory processing and semantic integration accounts of lexical predictability effects. And finally, this result provides evidence that comprehenders are highly sensitive to relative differences in predictability - even for differences between highly unpredictable words - and thus helps bring theoretical unity to our understanding of the role of prediction at multiple levels of linguistic structure in real-time language comprehension. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Tracking the mind during reading: the influence of past, present, and future words on fixation durations.

            Reading requires the orchestration of visual, attentional, language-related, and oculomotor processing constraints. This study replicates previous effects of frequency, predictability, and length of fixated words on fixation durations in natural reading and demonstrates new effects of these variables related to 144 sentences. Such evidence for distributed processing of words across fixation durations challenges psycholinguistic immediacy-of-processing and eye-mind assumptions. Most of the time the mind processes several words in parallel at different perceptual and cognitive levels. Eye movements can help to unravel these processes. ((c) 2006 APA, all rights reserved).
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              “Cloze Procedure”: A New Tool for Measuring Readability

                Bookmark

                Author and article information

                Contributors
                bbianchi@dc.uba.ar
                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group UK (London )
                2045-2322
                10 March 2020
                10 March 2020
                2020
                : 10
                : 4396
                Affiliations
                [1 ]ISNI 0000 0001 0056 1981, GRID grid.7345.5, Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación, Facultad de Ciencias Exactas y Naturales, , Universidad de Buenos Aires - Consejo Nacional de Investigación en Ciencia y Técnica, ; Ciudad Autónoma de Buenos Aires, Argentina
                [2 ]ISNI 0000 0001 0056 1981, GRID grid.7345.5, Departamento de Computación, Facultad de Ciencias Exactas y Naturales, , Universidad de Buenos Aires, ; Ciudad Autónoma de Buenos Aires, Argentina
                [3 ]ISNI 0000 0001 0056 1981, GRID grid.7345.5, Departamento de Física, Facultad de Ciencias Exactas y Naturales, , Universidad de Buenos Aires, ; Ciudad Autónoma de Buenos Aires, Argentina
                Article
                61353
                10.1038/s41598-020-61353-z
                7064512
                32157161
                12390484-5fb8-47cb-9cf6-19709a3b18cb
                © The Author(s) 2020

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 3 November 2019
                : 24 February 2020
                Categories
                Article
                Custom metadata
                © The Author(s) 2020

                Uncategorized
                computational models,language
                Uncategorized
                computational models, language

                Comments

                Comment on this article