25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

      research-article
      1 , a , 1
      Scientific Reports
      Nature Publishing Group

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex.

          How natural speech is represented in the auditory cortex constitutes a major challenge for cognitive neuroscience. Although many single-unit and neuroimaging studies have yielded valuable insights about the processing of speech and matched complex sounds, the mechanisms underlying the analysis of speech dynamics in human auditory cortex remain largely unknown. Here, we show that the phase pattern of theta band (4-8 Hz) responses recorded from human auditory cortex with magnetoencephalography (MEG) reliably tracks and discriminates spoken sentences and that this discrimination ability is correlated with speech intelligibility. The findings suggest that an approximately 200 ms temporal window (period of theta oscillation) segments the incoming speech signal, resetting and sliding to track speech dynamics. This hypothesized mechanism for cortical speech analysis is based on the stimulus-induced modulation of inherent cortical rhythms and provides further evidence implicating the syllable as a computational primitive for the representation of spoken language.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Attentional Routes to Conscious Perception

            The relationships between spatial attention and conscious perception are currently the object of intense debate. Recent evidence of double dissociations between attention and consciousness cast doubt on the time-honored concept of attention as a gateway to consciousness. Here we review evidence from behavioral, neurophysiologic, neuropsychological, and neuroimaging experiments, showing that distinct sorts of spatial attention can have different effects on visual conscious perception. While endogenous, or top-down attention, has weak influence on subsequent conscious perception of near-threshold stimuli, exogenous, or bottom-up forms of spatial attention appear instead to be a necessary, although not sufficient, step in the development of reportable visual experiences. Fronto-parietal networks important for spatial attention, with peculiar inter-hemispheric differences, constitute plausible neural substrates for the interactions between exogenous spatial attention and conscious perception.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Phase-Locked Responses to Speech in Human Auditory Cortex are Enhanced During Comprehension

              A growing body of evidence shows that ongoing oscillations in auditory cortex modulate their phase to match the rhythm of temporally regular acoustic stimuli, increasing sensitivity to relevant environmental cues and improving detection accuracy. In the current study, we test the hypothesis that nonsensory information provided by linguistic content enhances phase-locked responses to intelligible speech in the human brain. Sixteen adults listened to meaningful sentences while we recorded neural activity using magnetoencephalography. Stimuli were processed using a noise-vocoding technique to vary intelligibility while keeping the temporal acoustic envelope consistent. We show that the acoustic envelopes of sentences contain most power between 4 and 7 Hz and that it is in this frequency band that phase locking between neural activity and envelopes is strongest. Bilateral oscillatory neural activity phase-locked to unintelligible speech, but this cerebro-acoustic phase locking was enhanced when speech was intelligible. This enhanced phase locking was left lateralized and localized to left temporal cortex. Together, our results demonstrate that entrainment to connected speech does not only depend on acoustic characteristics, but is also affected by listeners’ ability to extract linguistic information. This suggests a biological framework for speech comprehension in which acoustic and linguistic cues reciprocally aid in stimulus prediction.
                Bookmark

                Author and article information

                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group
                2045-2322
                23 November 2016
                2016
                : 6
                : 37647
                Affiliations
                [1 ]Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST) , Daejeon, 34141, Republic of Korea
                Author notes
                Article
                srep37647
                10.1038/srep37647
                5120313
                27876875
                227393d7-5279-47bd-94cc-d3bb13ba7baf
                Copyright © 2016, The Author(s)

                This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

                History
                : 28 June 2016
                : 28 October 2016
                Categories
                Article

                Uncategorized
                Uncategorized

                Comments

                Comment on this article