11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Deep convolutional neural networks are being actively investigated in a wide range of speech and audio processing applications including speech recognition, audio event detection and computational paralinguistics, owing to their ability to reduce factors of variations, such as speaker and environment information in signals, for speech recognition. However, studies have suggested to favor a certain type of convolutional operations when building a deep convolutional neural network for speech applications although there has been promising results using different types of convolutional operations. In this work, we study four types of convolutional operations on different input features for speech emotion recognition in order to derive a comprehensive understanding. Since affective behavioral information has been shown to reflect temporally varying of mental state and convolutional operation are applied locally in time, all deep neural networks share a deep recurrent sub-network architecture for further temporal modeling. We present detailed quantitative module-wise performance analysis to gain insights into information flows within the proposed architectures. In particular, we demonstrate the interplay of affective information and the other irrelevant information during the progression from one module to another. Finally we show that all of our deep neural networks provide state-of-the-art performance on the eNTERFACE'05 corpus.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: not found
          • Article: not found

          Convolutional Neural Networks for Speech Recognition

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks

                Bookmark

                Author and article information

                Journal
                2017-06-07
                Article
                1706.02901
                5d44ee46-8200-4b18-ae06-718ba44d3f59

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Submitted to IEEE transaction
                cs.LG cs.CL cs.MM cs.SD

                Theoretical computer science,Artificial intelligence,Graphics & Multimedia design

                Comments

                Comment on this article