17
views
0
recommends
+1 Recommend
2 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Estimating child linguistic experience from historical corpora

      research-article

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Child language acquisition is often identified as one of the primary drivers of language change, but the lack of historical child data presents a challenge for empirically investigating its effect. In this work, I observe the relationship between lexicons extracted from modern child-directed speech and those drawn from modern and historical literary corpora in order to better understand when language acquisition can be modeled over historical and non-child corpora as it is over child corpora. The type frequencies of morphophonological and syntactic-semantic patterns occur at similar type frequencies in these corpora among high token frequency items, and furthermore, when a learning algorithm is applied to lexicons sampled from these sources, it consistently achieves the same learning outcomes in each. With appropriate care and pre-processing, modern and historical text corpora are effectively interchangeable with child-directed speech corpora for the purpose of estimating child lexical experience, opening a path for modeling language acquisition where child-directed corpora are not available.

          Related collections

          Most cited references62

          • Record: found
          • Abstract: found
          • Article: not found

          The 385+ million wordCorpus of Contemporary American English(1990–2008+): Design, architecture, and linguistic insights

          TheCorpus of Contemporary American English(COCA), which was released online in early 2008, is the first large and diverse corpus of American English. In this paper, we first discuss the design of the corpus — which contains more than 385 million words from 1990–2008 (20 million words each year), balanced between spoken, fiction, popular magazines, newspapers, and academic journals. We also discuss the unique relational databases architecture, which allows for a wide range of queries that are not available (or are quite difficult) with other architectures and interfaces. To conclude, we consider insights from the corpus on a number of cases of genre-based variation and recent linguistic variation, including an extended analysis of phrasal verbs in contemporary American English.
            Bookmark
            • Record: found
            • Abstract: not found
            • Book: not found

            English verb classes and alternations: A preliminary investigation

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              On language and connectionism: analysis of a parallel distributed processing model of language acquisition.

                Bookmark

                Author and article information

                Contributors
                Journal
                2397-1835
                Glossa: a journal of general linguistics
                Ubiquity Press
                2397-1835
                08 November 2019
                2019
                : 4
                : 1
                : 122
                Affiliations
                [1 ]University of Pennsylvania, Philadelphia, PA, US
                Article
                10.5334/gjgl.926
                9a799ba6-4cc0-4231-b9c9-5adcfdbe4398
                Copyright: © 2019 The Author(s)

                This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.

                History
                : 23 February 2019
                : 10 September 2019
                Categories
                Squib

                General linguistics,Linguistics & Semiotics
                paradigm saturation,Proto-Germanic,Spanish,Latin,English,historical linguistics,corpus linguistics,child language acquisition

                Comments

                Comment on this article