+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Studying rhythm processing in speech through the lens of auditory-motor synchronization


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Continuous speech is organized into a hierarchy of rhythms. Accurate processing of this rhythmic hierarchy through the interactions of auditory and motor systems is fundamental to speech perception and production. In this mini-review, we aim to evaluate the implementation of behavioral auditory-motor synchronization paradigms when studying rhythm processing in speech. First, we present an overview of the classic finger-tapping paradigm and its application in revealing differences in auditory-motor synchronization between the typical and clinical populations. Next, we highlight key findings on rhythm hierarchy processing in speech and non-speech stimuli from finger-tapping studies. Following this, we discuss the potential caveats of the finger-tapping paradigm and propose the speech-speech synchronization (SSS) task as a promising tool for future studies. Overall, we seek to raise interest in developing new methods to shed light on the neural mechanisms of speech processing.

          Related collections

          Most cited references66

          • Record: found
          • Abstract: found
          • Article: not found

          Sensorimotor synchronization: a review of recent research (2006-2012).

          Sensorimotor synchronization (SMS) is the coordination of rhythmic movement with an external rhythm, ranging from finger tapping in time with a metronome to musical ensemble performance. An earlier review (Repp, 2005) covered tapping studies; two additional reviews (Repp, 2006a, b) focused on music performance and on rate limits of SMS, respectively. The present article supplements and extends these earlier reviews by surveying more recent research in what appears to be a burgeoning field. The article comprises four parts, dealing with (1) conventional tapping studies, (2) other forms of moving in synchrony with external rhythms (including dance and nonhuman animals' synchronization abilities), (3) interpersonal synchronization (including musical ensemble performance), and (4) the neuroscience of SMS. It is evident that much new knowledge about SMS has been acquired in the last 7 years.
            • Record: found
            • Abstract: found
            • Article: not found

            Sensorimotor integration in speech processing: computational basis and neural organization.

            Sensorimotor integration is an active domain of speech research and is characterized by two main ideas, that the auditory system is critically involved in speech production and that the motor system is critically involved in speech perception. Despite the complementarity of these ideas, there is little crosstalk between these literatures. We propose an integrative model of the speech-related "dorsal stream" in which sensorimotor interaction primarily supports speech production, in the form of a state feedback control architecture. A critical component of this control system is forward sensory prediction, which affords a natural mechanism for limited motor influence on perception, as recent perceptual research has suggested. Evidence shows that this influence is modulatory but not necessary for speech perception. The neuroanatomy of the proposed circuit is discussed as well as some probable clinical correlates including conduction aphasia, stuttering, and aspects of schizophrenia. Copyright © 2011 Elsevier Inc. All rights reserved.
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain

              Introduction A large number of invasive and non-invasive neurophysiological studies provide converging evidence that cortical oscillations play an important role in gating information flow in the human brain, thereby supporting a variety of cognitive processes including attention, working memory, and decision-making [1]–[3]. These oscillations can be hierarchically organised. For example, the phase of (4–8) Hz theta oscillations can modulate the amplitude of (30–90 Hz) gamma oscillations; the phase of (1–2 Hz) delta oscillations can modulate the amplitude of theta oscillations [4]–[8]. Interestingly, speech comprises a remarkably similar hierarchy of rhythmic components representing prosody (delta band), syllables (theta band), and phonemes (gamma band) [9]–[12]. The similarity in the hierarchical organisation of cortical oscillations and the rhythmic components of speech suggests that cortical oscillations at different frequencies might sample auditory speech input at different rates. Cortical oscillations could therefore represent an ideal medium for multiplexed segmentation and coding of speech [9],[12]–[17]. The hierarchical coupling of oscillations (with fast oscillations nested in slow oscillations) could be used to multiplex complementary information over multiple time scales [18] (see also [19]) for example by separately encoding fast (e.g., phonemic) and slower (e.g., syllabic) information and their temporal relationships. Previous studies have demonstrated amplitude and phase modulation in response to speech stimuli in the delta, theta, and gamma bands using electroencephalography (EEG)/magnetoencephalography (MEG) [13],[15],[20]–[25] and electrocorticography (ECOG) [26]–[29]. These findings support an emerging view that speech stimuli induce low-frequency phase patterns in auditory areas that code input information. Interestingly, these phase patterns seem to be under attentional control. For example, in the well known cocktail party situation, they code mainly for the attended stimulus [26],[30],[31]. Thus, brain oscillations have become obvious candidates for segmenting and parsing continuous speech because they reflect rhythmic changes in excitability [12]. This attractive model leaves three important points largely unresolved: First, a comprehensive account of how rhythmic components in speech interact with brain oscillations is still missing and it is uncertain if the previously reported hemispheric asymmetry during speech perception is also evident in a lateralized alignment of brain oscillations to continuous speech. Behavioural, electrophysiological, and neuroimaging studies [13],[15],[20],[23],[32] suggest that there is a relatively long integration window (100–300 ms, corresponding to the theta band) in the right auditory cortex and a relatively short integration window (20–40 ms, corresponding to the gamma band) in the left auditory cortex [14]. But it is unclear whether this differentiation is relevant for oscillatory tracking of speech. Second, it is unknown whether cortical brain oscillations are hierarchically coupled during perception of continuous speech. This is of particular interest because hierarchically coupled brain oscillations could represent hierarchically organised speech components (prosody, syllables, phonemes) at different temporal scales. Third, it is unclear how oscillatory speech tracking dynamically adapts to arrhythmic components in speech. If brain oscillations implement a universal mechanism for speech processing they should also account for variations or breaks in speech rhythmicity, so that the phase of low-frequency oscillations aligns to (quasi-periodic) salient speech events for optimal processing. Here, we addressed these three points using continuous speech and analysis based on information theory. Importantly, all three points were investigated for intelligible and unintelligible (backward played) speech. We analysed the frequency-specific dependencies between the speech envelope and brain activity. We also analysed the dependencies between cortical oscillations across different frequencies. We first hypothesised that a multi-scale hierarchy of oscillations in the listener's brain tracks the dynamics of the speaker's speech envelope—specifically, preferential theta band tracking in the right auditory cortex and gamma band tracking in the left auditory cortex. Second, we asked whether speech-entrained brain oscillations are hierarchically coupled and if so how that coupling is modulated by the stimulus. Third, we asked whether phase of low-frequency brain oscillations (likely indicating rhythmic variations in neural excitability) in the auditory cortex coincide with and adapt to salient events in speech stimuli. We presented a 7-min long continuous story binaurally to 22 participants while recording neural activity with MEG (“story” condition). As a control condition the same story was played backwards (“back” condition). We used mutual information (MI) to measure all dependencies (linear and nonlinear) between the speech signal and its encoding in brain oscillations [33],[34]. We did so in all brain voxels for frequencies from 1 to 60 Hz and for important interactions (phase-phase, amplitude-amplitude, cross-frequency phase-amplitude, and cross-frequency amplitude-phase, see Figure 1 and Materials and Methods). This resulted in frequency specific functional brain maps of dependencies between the speech envelope and brain activity. Similar analysis was performed to study dependencies between brain oscillations within cortical areas but across different frequency bands. 10.1371/journal.pbio.1001752.g001 Figure 1 Mutual information analysis. The broadband amplitude envelope is computed for the speech signal. For each frequency band speech envelope and MEG signals are bandpass filtered and activation time series are computed for each voxel in the brain. Phase and amplitude time series are computed from the Hilbert transform for speech and voxel time series and subjected to MI analysis. MI is computed between speech signal and time series for each voxel leading to a tomographic map of MI. Group statistical analysis is performed on these maps across all 22 participants. Our results reveal hierarchically coupled oscillations in speech-related brain areas and their alignment to quasi-rhythmic components in continuous speech (prosody, syllables, phonemes), with pronounced asymmetries between left and right hemispheres. Edges in the speech envelope reset oscillatory low-frequency phase in left and right auditory cortices. Phase resets in cortical oscillations code features of the speech edges and help to align temporal windows of high neural excitability to optimise processing of important speech events. Importantly, we demonstrate that oscillatory speech tracking and hierarchical couplings significantly reduce for backward-presented speech and so are not only stimulus driven. Results Oscillatory Speech Tracking Relies on Two Mechanisms We first asked whether there is phase-locking between rhythmic changes in the speech envelope and corresponding oscillatory brain activity. Whereas most previous studies quantify phase-locking to stimulus onset across repeated presentations of the same stimulus, here we studied phase-locking over time directly between speech envelope and brain oscillations. To do this, we compared the phase coupling between the speech and oscillatory brain activity (in 1 Hz steps between 1 and 60 Hz) in two conditions: story and back. Figure 2 summarizes the results. First, MI revealed a significantly stronger phase coupling between the speech envelope and brain oscillations in the story compared to back conditions in the left and right auditory cortex in delta (1–3 Hz) and theta (3–7 Hz) frequency bands (group statistics, p 0.05). Time-locked to these onsets we have extracted trials from −500 ms to 1,000 ms. PLV analysis PLVs [74] were computed in three ways. First, as phase-locking of auditory theta activity across trials (PLV = 1/n|∑ exp(i * ph)| where n is the number of trials and ph the phase of auditory theta signal). Second, the phase-locking of the phase difference between auditory theta signal and the theta speech envelope was computed (PLVsp = 1/n |∑ (exp(i * (ph−phs))| where n is the number of trials and ph the phase of auditory theta signal and phs the theta phase of speech envelope). Third, the phase-locking between left and right auditory theta activity (PLVsp = 1/n |∑ (exp(i * (phl−phr))| where n is the number of trials and phl and phr the phase of left and right auditory theta signal, respectively). Time-resolved PLV data were averaged in three time windows (−200 ms to 0 ms, 100–300 ms, 400–600 ms) and subjected to Anova analysis with factors time window and PLV measure. Both factors and their interactions were highly significant (time window: F = 39.77, p<0.001; PLV measure: F = 50.11, p<0.001; interaction: F = 14.86, p<0.001). Speech sampling For each voxel the instantaneous amplitude A and phase ph for each speech trial was computed (Figure 6). For each trial the cross-correlation of either cos(ph) or A with the speech envelope was computed over the time range 0–500 ms following onset with a maximum lag of 150 ms. The maximum correlation across lags was averaged across trials. As control the same computation was repeated with a random shuffling of trial order for the speech data (to destroy the correspondence between trials for speech and brain data). Cross-frequency analysis We performed two separate analyses to investigate the spatio-spectral distribution of cross-frequency coupling (Figure 7). First, we computed cross-frequency coupling between theta phase and 40 Hz gamma amplitude in all brain voxels. Second, we computed the full cross-frequency coupling matrix separately for the left and right auditory cortex. The first analysis was motivated by Figure 2C that demonstrates coupling between speech theta phase and auditory 40 Hz amplitude dynamics and by Figure 4 that shows theta phase to gamma amplitude coupling in the auditory cortex. Analysis of cross-frequency coupling was performed by computing MI as in Figure 2C (but without using the speech signal). For each brain voxel MI between theta phase and gamma amplitude was computed for the two 500 ms windows preceding and following speech onset across all 254 trials. t-values of contrast post-onset versus pre-onset were computed across trials. The computation was performed for the story and back condition. As in Figure 2 individual maps were subjected to dependent samples t-test with randomisation-based FDR correction. Group t-maps are displayed with thresholds corresponding to p<0.05 (FDR corrected). The second analysis was performed only in the left and right auditory cortex. Here, we computed MI as before but now for all combinations of phase (range 1–10 Hz) and amplitude (range 4–80 Hz). Group t-statistic was computed for the difference between story condition and surrogate data (surrogate data were the same as story condition but each amplitude signal was matched with phase signal from a random trial). For each frequency-frequency pair we computed a bootstrap confidence level by randomly drawing 22 participants with replacement in each of 500 bootstrap iterations and computing the 95th percentile. The lateralisation analysis in Figure 7C follows the same approach as for Figure 7B and compares cross-frequency coupling for the story condition between the left and right auditory cortex. Supporting Information Figure S1 (A) Mutual information group statistics for surrogate data. Group statistical map of phase-phase MI dependencies in the theta frequency band. This figure corresponds to Figure 2B but here the back condition has been replaced with a surrogate condition consisting of the MEG data from the story condition and the reversed speech envelope from the story condition to estimate dependencies that could be expected by chance. (B) Phase-locking group statistics. This figure corresponds to Figure 2 but instead of MI PLV has been used to quantify the dependence between phase of low-frequency speech envelope and brain activity in the delta band. (C) Same as (B) but for theta frequency band. (PDF) Click here for additional data file. Figure S2 Bar plot of individual lateralisation indices. For each participant the lateralisation index for theta-phase lateralisation (red) and theta-gamma lateralisation (blue) in Heschl's gyrus (left panel) and superior temporal gyrus (STG, right panel) is shown. Each pair of red/blue bars corresponds to an individual. (PDF) Click here for additional data file. Figure S3 Bar plot of mutual information in the auditory cortex. For each panel mean and SEM is shown for the left and right auditory cortex for all conditions. An asterisk indicates relevant significant differences (t-test with p<0.05). Control condition is computed from surrogate data where brain activity from story condition is used together with speech envelope from back condition. (A) Bar plot for delta phase. (B) Bar plot for theta phase. (C) Bar plot for mutual information between theta phase in speech and gamma amplitude in the auditory cortex. (D) Bar plot for mutual information between theta phase and gamma amplitude in the auditory cortex. Here, control condition was obtained from mutual information with gamma time series reversed. (PDF) Click here for additional data file. Figure S4 Group statistics of cross-frequency coupling. (A) Statistical map of difference between story and back condition for mutual information between delta phase and theta amplitude. (B) Statistical map of lateralisation of mutual information between delta phase and theta amplitude for the story condition. (C) Statistical map of difference between story and back condition for mutual information between theta phase and gamma amplitude. This map corresponds to Figure 4A but is computed using a different method for quantifying cross-frequency coupling [76]. (PDF) Click here for additional data file. Figure S5 Phase coding of speech amplitude. The phase of theta oscillations at 100 ms after speech onset in the left (black) and right (red) auditory cortex codes the maximum amplitude of speech envelope in the first 200 ms following onset. The area signifies the 95% confidence interval around the median obtained from bootstrap analysis. (PDF) Click here for additional data file.

                Author and article information

                Front Neurosci
                Front Neurosci
                Front. Neurosci.
                Frontiers in Neuroscience
                Frontiers Media S.A.
                02 March 2023
                : 17
                : 1146298
                [1] 1School of Psychology, Beijing Sport University , Beijing, China
                [2] 2Laboratory of Sports Stress and Adaptation of General Administration of Sport , Beijing, China
                [3] 3Center for the Cognitive Science of Language, Beijing Language and Culture University , Beijing, China
                Author notes

                Edited by: Dan Zhang, Tsinghua University, China

                Reviewed by: Jiawei Li, Freie Universität Berlin, Germany

                *Correspondence: Lingxi Lu, lingxilu@ 123456blcu.edu.cn

                This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

                Copyright © 2023 Luo and Lu.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                : 17 January 2023
                : 20 February 2023
                Page count
                Figures: 0, Tables: 0, Equations: 0, References: 66, Pages: 6, Words: 5092
                Funded by: National Natural Science Foundation of China, doi 10.13039/501100001809;
                Award ID: 32100866
                This work was supported by the National Natural Science Foundation of China (32100866), the International Chinese Language Education Research Program of the Center for Language Education and Cooperation (22YH22B), and the Research Project “The Construction of the Advanced Disciplines in Universities in Beijing”.
                Mini Review

                auditory-motor synchronization,rhythm,speech,finger-tapping task,speech-to-speech synchronization task


                Comment on this article