+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Based on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. The numbers range from 27,000 lemmas for the lowest 5% to 52,000 for the highest 5%. Between the ages of 20 and 60, the average person learns 6,000 extra lemmas or about one new lemma every 2 days. The knowledge of the words can be as shallow as knowing that the word exists. In addition, people learn tens of thousands of inflected forms and proper nouns (names), which account for the substantially high numbers of ‘words known’ mentioned in other publications.

          Related collections

          Most cited references 30

          • Record: found
          • Abstract: found
          • Article: not found

          Quantitative analysis of culture using millions of digitized books.

          We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of 'culturomics,' focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
            • Record: found
            • Abstract: not found
            • Article: not found

            How Many Words Are There in Printed School English?

              • Record: found
              • Abstract: found
              • Article: not found

              SUBTLEX-UK: a new and improved word frequency database for British English.

              We present word frequencies based on subtitles of British television programmes. We show that the SUBTLEX-UK word frequencies explain more of the variance in the lexical decision times of the British Lexicon Project than the word frequencies based on the British National Corpus and the SUBTLEX-US frequencies. In addition to the word form frequencies, we also present measures of contextual diversity part-of-speech specific word frequencies, word frequencies in children programmes, and word bigram frequencies, giving researchers of British English access to the full range of norms recently made available for other languages. Finally, we introduce a new measure of word frequency, the Zipf scale, which we hope will stop the current misunderstandings of the word frequency effect.

                Author and article information

                Front Psychol
                Front Psychol
                Front. Psychol.
                Frontiers in Psychology
                Frontiers Media S.A.
                29 July 2016
                : 7
                Department of Experimental Psychology, Ghent University Ghent, Belgium
                Author notes

                Edited by: Manuel Perea, University of Valencia, Spain

                Reviewed by: Michael S. Vitevitch, University of Kansas, USA; Pablo Gomez, DePaul University, USA; Cristina Izura, Swansea University, UK

                *Correspondence: Marc Brysbaert, marc.brysbaert@

                This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

                Copyright © 2016 Brysbaert, Stevens, Mandera and Keuleers.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                Figures: 4, Tables: 4, Equations: 1, References: 43, Pages: 11, Words: 0
                Original Research

                Clinical Psychology & Psychiatry

                reading, vocabulary size, word knowledge


                Comment on this article