11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Jointly Aligning and Predicting Continuous Emotion Annotations

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Time-continuous dimensional descriptions of emotions (e.g., arousal, valence) allow researchers to characterize short-time changes and to capture long-term trends in emotion expression. However, continuous emotion labels are generally not synchronized with the input speech signal due to delays caused by reaction-time, which is inherent in human evaluations. To deal with this challenge, we introduce a new convolutional neural network (multi-delay sinc network) that is able to simultaneously align and predict labels in an end-to-end manner. The proposed network is a stack of convolutional layers followed by an aligner network that aligns the speech signal and emotion labels. This network is implemented using a new convolutional layer that we introduce, the delayed sinc layer. It is a time-shifted low-pass (sinc) filter that uses a gradient-based algorithm to learn a single delay. Multiple delayed sinc layers can be used to compensate for a non-stationary delay that is a function of the acoustic space. We test the efficacy of this system on two common emotion datasets, RECOLA and SEWA, and show that this approach obtains state-of-the-art speech-only results by learning time-varying delays while predicting dimensional descriptors of emotions.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: not found
          • Article: not found

          Emotion recognition in human-computer interaction

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network

                Bookmark

                Author and article information

                Journal
                05 July 2019
                Article
                10.1109/TAFFC.2019.2917047
                1907.03050
                24fe3012-85a7-4002-9934-105f8927f15f

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                IEEE Transactions on Affective Computing
                cs.LG cs.HC eess.AS stat.ML

                Machine learning,Artificial intelligence,Electrical engineering,Human-computer-interaction

                Comments

                Comment on this article