Human and computer estimations of Predictability of words in written language

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In parallel, the Natural Language Processing (NLP) field evolved by developing a wide variety of applications. Here, we show that using different word embeddings techniques (like Latent Semantic Analysis, Word2Vec, and FastText) and N-gram-based language models we were able to estimate how humans predict words (cloze-task Predictability) and how to better understand eye movements in long Spanish texts. Both types of models partially captured aspects of predictability. On the one hand, our N-gram model performed well when added as a replacement for the cloze-task Predictability of the fixated word. On the other hand, word embeddings were useful to mimic Predictability of the following word. Our study joins efforts from neurolinguistic and NLP fields to understand human information processing during reading to potentially improve NLP algorithms.

Related collections

Most cited references 16

Record: found
Abstract: found
Article: not found

Is Open Access

The effect of word predictability on reading time is logarithmic.

Roger Levy, J. Smith (2013)

It is well known that real-time human language processing is highly incremental and context-driven, and that the strength of a comprehender's expectation for each word encountered is a key determinant of the difficulty of integrating that word into the preceding context. In reading, this differential difficulty is largely manifested in the amount of time taken to read each word. While numerous studies over the past thirty years have shown expectation-based effects on reading times driven by lexical, syntactic, semantic, pragmatic, and other information sources, there has been little progress in establishing the quantitative relationship between expectation (or prediction) and reading times. Here, by combining a state-of-the-art computational language model, two large behavioral data-sets, and non-parametric statistical techniques, we establish for the first time the quantitative form of this relationship, finding that it is logarithmic over six orders of magnitude in estimated predictability. This result is problematic for a number of established models of eye movement control in reading, but lends partial support to an optimal perceptual discrimination account of word recognition. We also present a novel model in which language processing is highly incremental well below the level of the individual word, and show that it predicts both the shape and time-course of this effect. At a more general level, this result provides challenges for both anticipatory processing and semantic integration accounts of lexical predictability effects. And finally, this result provides evidence that comprehenders are highly sensitive to relative differences in predictability - even for differences between highly unpredictable words - and thus helps bring theoretical unity to our understanding of the role of prediction at multiple levels of linguistic structure in real-time language comprehension. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

0 comments Cited 138 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Tracking the mind during reading: the influence of past, present, and future words on fixation durations.

Reinhold Kliegl, Antje Nuthmann, Ralf Engbert (2006)

Reading requires the orchestration of visual, attentional, language-related, and oculomotor processing constraints. This study replicates previous effects of frequency, predictability, and length of fixated words on fixation durations in natural reading and demonstrates new effects of these variables related to 144 sentences. Such evidence for distributed processing of words across fixation durations challenges psycholinguistic immediacy-of-processing and eye-mind assumptions. Most of the time the mind processes several words in parallel at different perceptual and cognitive levels. Eye movements can help to unravel these processes. ((c) 2006 APA, all rights reserved).

0 comments Cited 130 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

“Cloze Procedure”: A New Tool for Measuring Readability

Wilson Taylor (2016)

0 comments Cited 129 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Bruno Bianchi: bbianchi@dc.uba.ar

Journal

Journal ID (nlm-ta): Sci Rep

Journal ID (iso-abbrev): Sci Rep

Title: Scientific Reports

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2045-2322

Publication date (Electronic): 10 March 2020

Publication date PMC-release: 10 March 2020

Publication date Collection: 2020

Volume: 10

Electronic Location Identifier: 4396

Affiliations

[1 ]ISNI 0000 0001 0056 1981, GRID grid.7345.5, Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación, Facultad de Ciencias Exactas y Naturales, , Universidad de Buenos Aires - Consejo Nacional de Investigación en Ciencia y Técnica, ; Ciudad Autónoma de Buenos Aires, Argentina

[2 ]ISNI 0000 0001 0056 1981, GRID grid.7345.5, Departamento de Computación, Facultad de Ciencias Exactas y Naturales, , Universidad de Buenos Aires, ; Ciudad Autónoma de Buenos Aires, Argentina

[3 ]ISNI 0000 0001 0056 1981, GRID grid.7345.5, Departamento de Física, Facultad de Ciencias Exactas y Naturales, , Universidad de Buenos Aires, ; Ciudad Autónoma de Buenos Aires, Argentina

Article

Publisher ID: 61353

DOI: 10.1038/s41598-020-61353-z

PMC ID: 7064512

PubMed ID: 32157161

SO-VID: 12390484-5fb8-47cb-9cf6-19709a3b18cb

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 3 November 2019

Date accepted : 24 February 2020

Custom metadata

ScienceOpen disciplines: Uncategorized

Keywords: computational models,language

Data availability:

ScienceOpen disciplines: Uncategorized

Keywords: computational models, language

Human and computer estimations of Predictability of words in written language

Read this article at

Abstract

Related collections

Radiology and Natural Language Processing

Most cited references 16

The effect of word predictability on reading time is logarithmic.

Tracking the mind during reading: the influence of past, present, and future words on fixation durations.

“Cloze Procedure”: A New Tool for Measuring Readability

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 410

Cited by 1

Most referenced authors 157