Accounting for Item Variance in Large-scale Databases

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The Dutch Lexicon Project (DLP, Keuleers et al., 2010) is the third published database providing lexical decision times for a large number of items (after the ELP, Balota et al., 2007, and the FLP, Ferrand et al., 2010). In this commentary, we address the issue of the amount of item variance that models should really try to account for in the DLP (Spieler and Balota, 1997). As noted by Seidenberg and Plaut (1998), to test the descriptive adequacy of simulation models with item-level databases, one needs to estimate the amount of error variance (i.e., sources of variance that are unspecific to item processing and that models cannot, in principle, capture) and, conversely, the amount of item variance that models should try to account for. One way to address this issue is to create independent groups of participants from a single database, and to compute the correlation between the item performances averaged over participants in each group (Courrieu et al., in press; Rey et al., 2009). One can show that the expected value of such correlations has the form of an intraclass correlation coefficient (ICC): (1) ρ = n q n q + 1 where ρ is the ICC, n is the number of participants per group, and q is the ratio of the item related variance on the noise variance for the considered database (for more details, see Courrieu et al., in press or Rey et al., 2009). As discussed in Courrieu et al. (in press), there are basically two methods for estimating ρ and q. The first one is based on a standard analysis of variance (ANOVA) of the database. This method is fast, accurate, and it provides suitable confidence limits for the ICC estimate. The other method is of Monte Carlo type. It is based on a permutation resampling procedure, which is computationally more demanding and more sensitive to missing data than the ANOVA method. However, this approach is distribution free and much more flexible than the ANOVA. In order to apply these methods, the database needs to be available in the form of a m × n table, where m is the number of items, and n is the number of participants. The DLP database clearly fulfils this requirement, with m = 14089, and n = 39. The ELP and FLP databases are more problematic from this point of view because each participant provided data only for a subset of the whole set of items. A possible solution is to create “virtual” participants by mixing the data of various participants, previously transformed to z-scores (Faust et al., 1999), but this needs further investigations. Fortunately, no such a problem occurs with the DLP database, however, the important proportion of missing data in this database (16%) prevents from applying the permutation resampling method. Nevertheless, an ANOVA based analysis provided an overall ICC equal to 0.8448, with a 99% confidence interval of (0.8386, 0.8510), indicating that this database contains about 84.5% of reproducible item variance1. A model that accounts for less than 83.86% of the empirical item variance probably under-fits the data, while a model that accounts for more than 85.10% of the empirical item variance probably over-fits the data (in general because it uses too many free parameters). Of course, this estimation is task-dependent and language dependent. Using a different task, a different language, a different set of items (e.g., monosyllabic or disyllabic words), or a different population sample (e.g., older adults) might generate different outcomes. Because this analysis has already been applied to different large-scale databases using different experimental paradigms and different languages (i.e., a naming task with English and French disyllabic words, Courrieu et al., in press, and a perceptual identification task with English monosyllables, Rey et al., 2009), it is now possible to directly compare these results. Indeed, for each database, a different q ratio has been estimated and one can now plot the resulting evolution of the ICC as a function of the number of participants for each database (see Figure 1). This figure clearly shows that there are important variations across experimental paradigms and languages (or population samples, which is still a confounded factor in the present situation) and that these variations can be explicitly quantified. For example, to reach the same amount of reproducible variance obtained in the DLP database (i.e., 84.5% with 39 participants), one would need to have 90 participants in the English perceptual identification task from Rey et al. (2009). Figure 1 Evolution of the amount of reproducible variance ICC as a function of the number of participants in four databases: the DLP (Lexical decision task in Dutch, LDT-Dutch), Rey et al. (2009; Perceptual identification task with English monosyllables, PI-English), Courrieu et al. (in press; Naming disyllables in English and French: naming-English + naming-French). For each of these databases, the estimated q parameter was respectively: 0.1396, 0.0607, 0.1333, 0.2269. To conclude, the purpose of the present commentary was to provide a precise estimate of the amount of reproducible variance that is present in the DLP database and to compare the evolution of the reproducible variance across tasks or languages. By providing this information, it is now possible to precisely test the descriptive adequacy of any model that could generate item-level predictions trying to account for item variance in the DLP database.

Related collections

Most cited references 4

Record: found
Abstract: not found
Article: not found

Bringing Computational Models of Word Naming Down to the Item Level

D. H. Spieler, D A Balota (1997)

0 comments Cited 40 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Evaluating Word-Reading Models at the Item Level: Matching the Grain of Theory and Data

Mark S. Seidenberg, David C. Plaut (1998)

0 comments Cited 13 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Item performance in visual word recognition.

Maarten Jacobs, Arnaud Rey, F Weigand … (2009)

Standard factorial designs in psycholinguistics have been complemented recently by large-scale databases providing empirical constraints at the level of item performance. At the same time, the development of precise computational architectures has led modelers to compare item-level performance with item-level predictions. It has been suggested, however, that item performance includes a large amount of undesirable error variance that should be quantified to determine the amount of reproducible variance that models should account for. In the present study, we provide a simple and tractable statistical analysis of this issue. We also report practical solutions for estimating the amount of reproducible variance for any database that conforms to the additive decomposition of the variance. A new empirical database consisting of the word identification times of 140 participants on 120 words is then used to test these practical solutions. Finally, we show that increases in the amount of reproducible variance are accompanied by the detection of new sources of variance.

0 comments Cited 8 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Front Psychol

Journal ID (publisher-id): Front. Psychology

Title: Frontiers in Psychology

Publisher: Frontiers Research Foundation

ISSN (Electronic): 1664-1078

Publication date (Electronic): 24 November 2010

Publication date Collection: 2010

Volume: 1

Electronic Location Identifier: 200

Affiliations

[1]simpleLaboratoire de Psychologie Cognitive, Department of Psychology, National Center for Scientific Research, Provence University Marseille, France

Author notes

*Correspondence: arnaud.rey@ 123456univ-provence.fr

This article was submitted to Frontiers in Language Sciences, a specialty of Frontiers in Psychology.

Article

DOI: 10.3389/fpsyg.2010.00200

PMC ID: 3125539

PubMed ID: 21738520

SO-VID: b17f816f-e3cf-470b-906e-4c2d2ef93e7f

License:

This is an open-access article subject to an exclusive license agreement between the authors and the Frontiers Research Foundation, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.

History

Date received : 24 October 2010

Date accepted : 25 October 2010

Page count

Figures: 1, Tables: 0, Equations: 1, References: 8, Pages: 2, Words: 1326

Comments

Comment on this article

scite_

Cited by 4

See all cited by

Most referenced authors 75

See all reference authors

Accounting for Item Variance in Large-scale Databases

Read this article at

Abstract

Related collections

Measurement of Glucocorticoid Receptor Signaling in Major Depression

Most cited references 4

Bringing Computational Models of Word Naming Down to the Item Level

Evaluating Word-Reading Models at the Item Level: Matching the Grain of Theory and Data

Item performance in visual word recognition.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 4

Cited by 4

Most referenced authors 75