35
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

      research-article
      * ,
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across languages. The S-AMPH model reveals a crucial developmental role for stress feet (AMs ~2 Hz). Stress feet underpin different linguistic rhythm typologies, and speech rhythm underpins language acquisition by infants in all languages.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: not found

          Derivation of auditory filter shapes from notched-noise data.

          A well established method for estimating the shape of the auditory filter is based on the measurement of the threshold of a sinusoidal signal in a notched-noise masker, as a function of notch width. To measure the asymmetry of the filter, the notch has to be placed both symmetrically and asymmetrically about the signal frequency. In previous work several simplifying assumptions and approximations were made in deriving auditory filter shapes from the data. In this paper we describe modifications to the fitting procedure which allow more accurate derivations. These include: 1) taking into account changes in filter bandwidth with centre frequency when allowing for the effects of off-frequency listening; 2) correcting for the non-flat frequency response of the earphone; 3) correcting for the transmission characteristics of the outer and middle ear; 4) limiting the amount by which the centre frequency of the filter can shift in order to maximise the signal-to-masker ratio. In many cases, these modifications result in only small changes to the derived filter shape. However, at very high and very low centre frequencies and for hearing-impaired subjects the differences can be substantial. It is also shown that filter shapes derived from data where the notch is always placed symmetrically about the signal frequency can be seriously in error when the underlying filter is markedly asymmetric. New formulae are suggested describing the variation of the auditory filter with frequency and level. The implication of the results for the calculation of excitation patterns are discussed and a modified procedure is proposed. The appendix list FORTRAN computer programs for deriving auditory filter shapes from notched-noise data and for calculating excitation patterns. The first program can readily be modified so as to derive auditory filter shapes from data obtained with other types of maskers, such as rippled noise.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex.

            How natural speech is represented in the auditory cortex constitutes a major challenge for cognitive neuroscience. Although many single-unit and neuroimaging studies have yielded valuable insights about the processing of speech and matched complex sounds, the mechanisms underlying the analysis of speech dynamics in human auditory cortex remain largely unknown. Here, we show that the phase pattern of theta band (4-8 Hz) responses recorded from human auditory cortex with magnetoencephalography (MEG) reliably tracks and discriminates spoken sentences and that this discrimination ability is correlated with speech intelligibility. The findings suggest that an approximately 200 ms temporal window (period of theta oscillation) segments the incoming speech signal, resetting and sliding to track speech dynamics. This hypothesized mechanism for cortical speech analysis is based on the stimulus-induced modulation of inherent cortical rhythms and provides further evidence implicating the syllable as a computational primitive for the representation of spoken language.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Speech recognition with primarily temporal cues.

              Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants, vowels, and words in simple sentences improved markedly as the number of bands increased; high speech recognition performance was obtained with only three bands of modulated noise. Thus, the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                7 December 2015
                2015
                : 10
                : 12
                : e0144411
                Affiliations
                [001]Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
                Birkbeck College, UNITED KINGDOM
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Conceived and designed the experiments: VL UG. Performed the experiments: VL. Analyzed the data: VL. Contributed reagents/materials/analysis tools: VL. Wrote the paper: VL UG.

                Article
                PONE-D-15-09717
                10.1371/journal.pone.0144411
                4671555
                26641472
                1eefb58f-be84-450b-83c5-39469dff5c4d
                © 2015 Leong, Goswami

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

                History
                : 10 March 2015
                : 18 November 2015
                Page count
                Figures: 10, Tables: 7, Pages: 37
                Funding
                This research was funded by a Harold Hyam Wingate Foundation Research Scholarship and a Lucy Cavendish College Junior Research Fellowship to VL, and by a grant from the Medical Research Council to UG (G0902375). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Custom metadata
                All original sound files are publicly available from the Figshare database: http://figshare.com/articles/SAMPH_CDS/1318572 DOI: 10.6084/m9.figshare.1318572.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article