• Record: found
  • Abstract: found
  • Article: found
Is Open Access

How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      Based on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. The numbers range from 27,000 lemmas for the lowest 5% to 52,000 for the highest 5%. Between the ages of 20 and 60, the average person learns 6,000 extra lemmas or about one new lemma every 2 days. The knowledge of the words can be as shallow as knowing that the word exists. In addition, people learn tens of thousands of inflected forms and proper nouns (names), which account for the substantially high numbers of ‘words known’ mentioned in other publications.

      Related collections

      Most cited references 30

      • Record: found
      • Abstract: found
      • Article: not found

      Quantitative analysis of culture using millions of digitized books.

      We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of 'culturomics,' focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
        • Record: found
        • Abstract: not found
        • Article: not found

        How Many Words Are There in Printed School English?

          • Record: found
          • Abstract: found
          • Article: not found

          SUBTLEX-UK: a new and improved word frequency database for British English.

          We present word frequencies based on subtitles of British television programmes. We show that the SUBTLEX-UK word frequencies explain more of the variance in the lexical decision times of the British Lexicon Project than the word frequencies based on the British National Corpus and the SUBTLEX-US frequencies. In addition to the word form frequencies, we also present measures of contextual diversity part-of-speech specific word frequencies, word frequencies in children programmes, and word bigram frequencies, giving researchers of British English access to the full range of norms recently made available for other languages. Finally, we introduce a new measure of word frequency, the Zipf scale, which we hope will stop the current misunderstandings of the word frequency effect.

            Author and article information

            Department of Experimental Psychology, Ghent University Ghent, Belgium
            Author notes

            Edited by: Manuel Perea, University of Valencia, Spain

            Reviewed by: Michael S. Vitevitch, University of Kansas, USA; Pablo Gomez, DePaul University, USA; Cristina Izura, Swansea University, UK

            *Correspondence: Marc Brysbaert, marc.brysbaert@

            This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

            Front Psychol
            Front Psychol
            Front. Psychol.
            Frontiers in Psychology
            Frontiers Media S.A.
            29 July 2016
            : 7
            27524974 4965448 10.3389/fpsyg.2016.01116
            Copyright © 2016 Brysbaert, Stevens, Mandera and Keuleers.

            This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

            Figures: 4, Tables: 4, Equations: 1, References: 43, Pages: 11, Words: 0
            Original Research

            Clinical Psychology & Psychiatry

            reading, vocabulary size, word knowledge


            Comment on this article