119
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Cortical entrainment to continuous speech: functional roles and interpretations

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Auditory cortical activity is entrained to the temporal envelope of speech, which corresponds to the syllabic rhythm of speech. Such entrained cortical activity can be measured from subjects naturally listening to sentences or spoken passages, providing a reliable neural marker of online speech processing. A central question still remains to be answered about whether cortical entrained activity is more closely related to speech perception or non-speech-specific auditory encoding. Here, we review a few hypotheses about the functional roles of cortical entrainment to speech, e.g., encoding acoustic features, parsing syllabic boundaries, and selecting sensory information in complex listening environments. It is likely that speech entrainment is not a homogeneous response and these hypotheses apply separately for speech entrainment generated from different neural sources. The relationship between entrained activity and speech intelligibility is also discussed. A tentative conclusion is that theta-band entrainment (4–8 Hz) encodes speech features critical for intelligibility while delta-band entrainment (1–4 Hz) is related to the perceived, non-speech-specific acoustic rhythm. To further understand the functional properties of speech entrainment, a splitter’s approach will be needed to investigate (1) not just the temporal envelope but what specific acoustic features are encoded and (2) not just speech intelligibility but what specific psycholinguistic processes are encoded by entrained cortical activity. Similarly, the anatomical and spectro-temporal details of entrained activity need to be taken into account when investigating its functional properties.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: found
          • Article: not found

          Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex.

          How natural speech is represented in the auditory cortex constitutes a major challenge for cognitive neuroscience. Although many single-unit and neuroimaging studies have yielded valuable insights about the processing of speech and matched complex sounds, the mechanisms underlying the analysis of speech dynamics in human auditory cortex remain largely unknown. Here, we show that the phase pattern of theta band (4-8 Hz) responses recorded from human auditory cortex with magnetoencephalography (MEG) reliably tracks and discriminates spoken sentences and that this discrimination ability is correlated with speech intelligibility. The findings suggest that an approximately 200 ms temporal window (period of theta oscillation) segments the incoming speech signal, resetting and sliding to track speech dynamics. This hypothesized mechanism for cortical speech analysis is based on the stimulus-induced modulation of inherent cortical rhythms and provides further evidence implicating the syllable as a computational primitive for the representation of spoken language.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Speech recognition with primarily temporal cues.

            Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants, vowels, and words in simple sentences improved markedly as the number of bands increased; high speech recognition performance was obtained with only three bands of modulated noise. Thus, the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain

              Introduction A large number of invasive and non-invasive neurophysiological studies provide converging evidence that cortical oscillations play an important role in gating information flow in the human brain, thereby supporting a variety of cognitive processes including attention, working memory, and decision-making [1]–[3]. These oscillations can be hierarchically organised. For example, the phase of (4–8) Hz theta oscillations can modulate the amplitude of (30–90 Hz) gamma oscillations; the phase of (1–2 Hz) delta oscillations can modulate the amplitude of theta oscillations [4]–[8]. Interestingly, speech comprises a remarkably similar hierarchy of rhythmic components representing prosody (delta band), syllables (theta band), and phonemes (gamma band) [9]–[12]. The similarity in the hierarchical organisation of cortical oscillations and the rhythmic components of speech suggests that cortical oscillations at different frequencies might sample auditory speech input at different rates. Cortical oscillations could therefore represent an ideal medium for multiplexed segmentation and coding of speech [9],[12]–[17]. The hierarchical coupling of oscillations (with fast oscillations nested in slow oscillations) could be used to multiplex complementary information over multiple time scales [18] (see also [19]) for example by separately encoding fast (e.g., phonemic) and slower (e.g., syllabic) information and their temporal relationships. Previous studies have demonstrated amplitude and phase modulation in response to speech stimuli in the delta, theta, and gamma bands using electroencephalography (EEG)/magnetoencephalography (MEG) [13],[15],[20]–[25] and electrocorticography (ECOG) [26]–[29]. These findings support an emerging view that speech stimuli induce low-frequency phase patterns in auditory areas that code input information. Interestingly, these phase patterns seem to be under attentional control. For example, in the well known cocktail party situation, they code mainly for the attended stimulus [26],[30],[31]. Thus, brain oscillations have become obvious candidates for segmenting and parsing continuous speech because they reflect rhythmic changes in excitability [12]. This attractive model leaves three important points largely unresolved: First, a comprehensive account of how rhythmic components in speech interact with brain oscillations is still missing and it is uncertain if the previously reported hemispheric asymmetry during speech perception is also evident in a lateralized alignment of brain oscillations to continuous speech. Behavioural, electrophysiological, and neuroimaging studies [13],[15],[20],[23],[32] suggest that there is a relatively long integration window (100–300 ms, corresponding to the theta band) in the right auditory cortex and a relatively short integration window (20–40 ms, corresponding to the gamma band) in the left auditory cortex [14]. But it is unclear whether this differentiation is relevant for oscillatory tracking of speech. Second, it is unknown whether cortical brain oscillations are hierarchically coupled during perception of continuous speech. This is of particular interest because hierarchically coupled brain oscillations could represent hierarchically organised speech components (prosody, syllables, phonemes) at different temporal scales. Third, it is unclear how oscillatory speech tracking dynamically adapts to arrhythmic components in speech. If brain oscillations implement a universal mechanism for speech processing they should also account for variations or breaks in speech rhythmicity, so that the phase of low-frequency oscillations aligns to (quasi-periodic) salient speech events for optimal processing. Here, we addressed these three points using continuous speech and analysis based on information theory. Importantly, all three points were investigated for intelligible and unintelligible (backward played) speech. We analysed the frequency-specific dependencies between the speech envelope and brain activity. We also analysed the dependencies between cortical oscillations across different frequencies. We first hypothesised that a multi-scale hierarchy of oscillations in the listener's brain tracks the dynamics of the speaker's speech envelope—specifically, preferential theta band tracking in the right auditory cortex and gamma band tracking in the left auditory cortex. Second, we asked whether speech-entrained brain oscillations are hierarchically coupled and if so how that coupling is modulated by the stimulus. Third, we asked whether phase of low-frequency brain oscillations (likely indicating rhythmic variations in neural excitability) in the auditory cortex coincide with and adapt to salient events in speech stimuli. We presented a 7-min long continuous story binaurally to 22 participants while recording neural activity with MEG (“story” condition). As a control condition the same story was played backwards (“back” condition). We used mutual information (MI) to measure all dependencies (linear and nonlinear) between the speech signal and its encoding in brain oscillations [33],[34]. We did so in all brain voxels for frequencies from 1 to 60 Hz and for important interactions (phase-phase, amplitude-amplitude, cross-frequency phase-amplitude, and cross-frequency amplitude-phase, see Figure 1 and Materials and Methods). This resulted in frequency specific functional brain maps of dependencies between the speech envelope and brain activity. Similar analysis was performed to study dependencies between brain oscillations within cortical areas but across different frequency bands. 10.1371/journal.pbio.1001752.g001 Figure 1 Mutual information analysis. The broadband amplitude envelope is computed for the speech signal. For each frequency band speech envelope and MEG signals are bandpass filtered and activation time series are computed for each voxel in the brain. Phase and amplitude time series are computed from the Hilbert transform for speech and voxel time series and subjected to MI analysis. MI is computed between speech signal and time series for each voxel leading to a tomographic map of MI. Group statistical analysis is performed on these maps across all 22 participants. Our results reveal hierarchically coupled oscillations in speech-related brain areas and their alignment to quasi-rhythmic components in continuous speech (prosody, syllables, phonemes), with pronounced asymmetries between left and right hemispheres. Edges in the speech envelope reset oscillatory low-frequency phase in left and right auditory cortices. Phase resets in cortical oscillations code features of the speech edges and help to align temporal windows of high neural excitability to optimise processing of important speech events. Importantly, we demonstrate that oscillatory speech tracking and hierarchical couplings significantly reduce for backward-presented speech and so are not only stimulus driven. Results Oscillatory Speech Tracking Relies on Two Mechanisms We first asked whether there is phase-locking between rhythmic changes in the speech envelope and corresponding oscillatory brain activity. Whereas most previous studies quantify phase-locking to stimulus onset across repeated presentations of the same stimulus, here we studied phase-locking over time directly between speech envelope and brain oscillations. To do this, we compared the phase coupling between the speech and oscillatory brain activity (in 1 Hz steps between 1 and 60 Hz) in two conditions: story and back. Figure 2 summarizes the results. First, MI revealed a significantly stronger phase coupling between the speech envelope and brain oscillations in the story compared to back conditions in the left and right auditory cortex in delta (1–3 Hz) and theta (3–7 Hz) frequency bands (group statistics, p 0.05). Time-locked to these onsets we have extracted trials from −500 ms to 1,000 ms. PLV analysis PLVs [74] were computed in three ways. First, as phase-locking of auditory theta activity across trials (PLV = 1/n|∑ exp(i * ph)| where n is the number of trials and ph the phase of auditory theta signal). Second, the phase-locking of the phase difference between auditory theta signal and the theta speech envelope was computed (PLVsp = 1/n |∑ (exp(i * (ph−phs))| where n is the number of trials and ph the phase of auditory theta signal and phs the theta phase of speech envelope). Third, the phase-locking between left and right auditory theta activity (PLVsp = 1/n |∑ (exp(i * (phl−phr))| where n is the number of trials and phl and phr the phase of left and right auditory theta signal, respectively). Time-resolved PLV data were averaged in three time windows (−200 ms to 0 ms, 100–300 ms, 400–600 ms) and subjected to Anova analysis with factors time window and PLV measure. Both factors and their interactions were highly significant (time window: F = 39.77, p<0.001; PLV measure: F = 50.11, p<0.001; interaction: F = 14.86, p<0.001). Speech sampling For each voxel the instantaneous amplitude A and phase ph for each speech trial was computed (Figure 6). For each trial the cross-correlation of either cos(ph) or A with the speech envelope was computed over the time range 0–500 ms following onset with a maximum lag of 150 ms. The maximum correlation across lags was averaged across trials. As control the same computation was repeated with a random shuffling of trial order for the speech data (to destroy the correspondence between trials for speech and brain data). Cross-frequency analysis We performed two separate analyses to investigate the spatio-spectral distribution of cross-frequency coupling (Figure 7). First, we computed cross-frequency coupling between theta phase and 40 Hz gamma amplitude in all brain voxels. Second, we computed the full cross-frequency coupling matrix separately for the left and right auditory cortex. The first analysis was motivated by Figure 2C that demonstrates coupling between speech theta phase and auditory 40 Hz amplitude dynamics and by Figure 4 that shows theta phase to gamma amplitude coupling in the auditory cortex. Analysis of cross-frequency coupling was performed by computing MI as in Figure 2C (but without using the speech signal). For each brain voxel MI between theta phase and gamma amplitude was computed for the two 500 ms windows preceding and following speech onset across all 254 trials. t-values of contrast post-onset versus pre-onset were computed across trials. The computation was performed for the story and back condition. As in Figure 2 individual maps were subjected to dependent samples t-test with randomisation-based FDR correction. Group t-maps are displayed with thresholds corresponding to p<0.05 (FDR corrected). The second analysis was performed only in the left and right auditory cortex. Here, we computed MI as before but now for all combinations of phase (range 1–10 Hz) and amplitude (range 4–80 Hz). Group t-statistic was computed for the difference between story condition and surrogate data (surrogate data were the same as story condition but each amplitude signal was matched with phase signal from a random trial). For each frequency-frequency pair we computed a bootstrap confidence level by randomly drawing 22 participants with replacement in each of 500 bootstrap iterations and computing the 95th percentile. The lateralisation analysis in Figure 7C follows the same approach as for Figure 7B and compares cross-frequency coupling for the story condition between the left and right auditory cortex. Supporting Information Figure S1 (A) Mutual information group statistics for surrogate data. Group statistical map of phase-phase MI dependencies in the theta frequency band. This figure corresponds to Figure 2B but here the back condition has been replaced with a surrogate condition consisting of the MEG data from the story condition and the reversed speech envelope from the story condition to estimate dependencies that could be expected by chance. (B) Phase-locking group statistics. This figure corresponds to Figure 2 but instead of MI PLV has been used to quantify the dependence between phase of low-frequency speech envelope and brain activity in the delta band. (C) Same as (B) but for theta frequency band. (PDF) Click here for additional data file. Figure S2 Bar plot of individual lateralisation indices. For each participant the lateralisation index for theta-phase lateralisation (red) and theta-gamma lateralisation (blue) in Heschl's gyrus (left panel) and superior temporal gyrus (STG, right panel) is shown. Each pair of red/blue bars corresponds to an individual. (PDF) Click here for additional data file. Figure S3 Bar plot of mutual information in the auditory cortex. For each panel mean and SEM is shown for the left and right auditory cortex for all conditions. An asterisk indicates relevant significant differences (t-test with p<0.05). Control condition is computed from surrogate data where brain activity from story condition is used together with speech envelope from back condition. (A) Bar plot for delta phase. (B) Bar plot for theta phase. (C) Bar plot for mutual information between theta phase in speech and gamma amplitude in the auditory cortex. (D) Bar plot for mutual information between theta phase and gamma amplitude in the auditory cortex. Here, control condition was obtained from mutual information with gamma time series reversed. (PDF) Click here for additional data file. Figure S4 Group statistics of cross-frequency coupling. (A) Statistical map of difference between story and back condition for mutual information between delta phase and theta amplitude. (B) Statistical map of lateralisation of mutual information between delta phase and theta amplitude for the story condition. (C) Statistical map of difference between story and back condition for mutual information between theta phase and gamma amplitude. This map corresponds to Figure 4A but is computed using a different method for quantifying cross-frequency coupling [76]. (PDF) Click here for additional data file. Figure S5 Phase coding of speech amplitude. The phase of theta oscillations at 100 ms after speech onset in the left (black) and right (red) auditory cortex codes the maximum amplitude of speech envelope in the first 200 ms following onset. The area signifies the 95% confidence interval around the median obtained from bootstrap analysis. (PDF) Click here for additional data file.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Hum Neurosci
                Front Hum Neurosci
                Front. Hum. Neurosci.
                Frontiers in Human Neuroscience
                Frontiers Media S.A.
                1662-5161
                28 May 2014
                2014
                : 8
                : 311
                Affiliations
                [1] 1Department of Psychology, New York University New York, NY, USA
                [2] 2Department of Electrical and Computer Engineering, University of Maryland College Park, College Park MD, USA
                [3] 3Department of Biology, University of Maryland College Park, College Park MD, USA
                [4] 4Institute for Systems Research, University of Maryland College Park, College Park MD, USA
                Author notes

                Edited by: Sonja A. E. Kotz, Max Planck Institute for Human Cognitive and Brain Sciences, Germany

                Reviewed by: István Winkler, University of Szeged, Hungary; Jonas Obleser, Max Planck Institute for Human Cognitive and Brain Sciences, Germany

                *Correspondence: Nai Ding, Department of Psychology, New York University, New York, NY 10012, USA e-mail: gahding@ 123456gmail.com ; Jonathan Z. Simon, Department of Electrical and Computer Engineering, University of Maryland College Park, College Park, MD 20742, USA e-mail: jzsimon@ 123456umd.edu

                This article was submitted to the journal Frontiers in Human Neuroscience.

                Article
                10.3389/fnhum.2014.00311
                4036061
                24904354
                46df761c-dcc6-4ec0-9ae7-b0ddd2fe911f
                Copyright © 2014 Ding and Simon.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 10 March 2014
                : 27 April 2014
                Page count
                Figures: 1, Tables: 1, Equations: 0, References: 73, Pages: 7, Words: 0
                Categories
                Neuroscience
                Review Article

                Neurosciences
                auditory cortex,entrainment of rhythms,speech intelligibility,speech perception in noise,speech envelope,cocktail party problem

                Comments

                Comment on this article