Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Towards reconstructing intelligible speech from the human auditory cortex

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Auditory stimulus reconstruction is a technique that finds the best approximation of the acoustic stimulus from the population of evoked neural activity. Reconstructing speech from the human auditory cortex creates the possibility of a speech neuroprosthetic to establish a direct communication with the brain and has been shown to be possible in both overt and covert conditions. However, the low quality of the reconstructed speech has severely limited the utility of this method for brain-computer interface (BCI) applications. To advance the state-of-the-art in speech neuroprosthesis, we combined the recent advances in deep learning with the latest innovations in speech synthesis technologies to reconstruct closed-set intelligible speech from the human auditory cortex. We investigated the dependence of reconstruction accuracy on linear and nonlinear (deep neural network) regression methods and the acoustic representation that is used as the target of reconstruction, including auditory spectrogram and speech synthesis parameters. In addition, we compared the reconstruction accuracy from low and high neural frequency ranges. Our results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram. These results demonstrate the efficacy of deep learning and speech synthesis algorithms for designing the next generation of speech BCI systems, which not only can restore communications for paralyzed patients but also have the potential to transform human-computer interaction technologies.

      Related collections

      Most cited references 74

      • Record: found
      • Abstract: found
      • Article: not found

      Deep learning.

      Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        Reducing the dimensionality of data with neural networks.

        High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          A fast learning algorithm for deep belief nets.

          We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
            Bookmark

            Author and article information

            Affiliations
            [1 ]ISNI 0000000419368729, GRID grid.21729.3f, Mortimer B. Zuckerman Mind Brain Behavior Institute, , Columbia University, ; New York, NY United States
            [2 ]ISNI 0000000419368729, GRID grid.21729.3f, Department of Electrical Engineering, , Columbia University, ; New York, NY United States
            [3 ]ISNI 0000 0001 2284 9943, GRID grid.257060.6, Hofstra Northwell School of Medicine, ; Manhasset, NY United States
            [4 ]ISNI 0000 0000 9566 0634, GRID grid.250903.d, The Feinstein Institute for Medical Research, ; Manhasset, NY United States
            Contributors
            nima@ee.columbia.edu
            Journal
            Sci Rep
            Sci Rep
            Scientific Reports
            Nature Publishing Group UK (London )
            2045-2322
            29 January 2019
            29 January 2019
            2019
            : 9
            6351601 37359 10.1038/s41598-018-37359-z
            © The Author(s) 2019

            Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

            Funding
            Funded by: FundRef https://doi.org/10.13039/100000055, U.S. Department of Health & Human Services | NIH | National Institute on Deafness and Other Communication Disorders (NIDCD);
            Award ID: DC014279
            Award ID: DC014279
            Award ID: DC014279
            Award ID: DC014279
            Award ID: DC014279
            Award Recipient :
            Categories
            Article
            Custom metadata
            © The Author(s) 2019

            Uncategorized

            Comments

            Comment on this article