11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal and Proximal Labels

      Preprint
      , , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Emotion shapes all aspects of our interpersonal and intellectual experiences. Its automatic analysis has there-fore many applications, e.g., human-machine interface. In this paper, we propose an emotional tonal speech dataset, namely Mandarin Chinese Emotional Speech Dataset - Portrayed (MES-P), with both distal and proximal labels. In contrast with state of the art emotional speech datasets which are only focused on perceived emotions, the proposed MES-P dataset includes not only perceived emotions with their proximal labels but also intended emotions with distal labels, thereby making it possible to study human emotional intelligence, i.e. people emotion expression ability and their skill of understanding emotions, thus explicitly accounting for perception differences between intended and perceived emotions in speech signals and enabling studies of emotional misunderstandings which often occur in real life. Furthermore, the proposed MES-P dataset also captures a main feature of tonal languages, i.e., tonal variations, and provides recorded emotional speech samples whose tonal variations match the tonal distribution in real life Mandarin Chinese. Besides, the proposed MES-P dataset features emotion intensity variations as well, and includes both moderate and intense versions of recordings for joy, anger, and sadness in addition to neutral speech. Ratings of the collected speech samples are made in valence-arousal space through continuous coordinate locations, resulting in an emotional distribution pattern in 2D VA space. The consistency between the speakers' emotional intentions and the listeners' perceptions is also studied using Cohen's Kappa coefficients. Finally, we also carry out extensive experiments using a baseline on MES-P for automatic emotion recognition and compare the results with human emotion intelligence.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: not found
          • Article: not found

          IEMOCAP: interactive emotional dyadic motion capture database

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Emotional speech recognition: Resources, features, and methods

                Bookmark

                Author and article information

                Journal
                29 August 2018
                Article
                1808.10095
                c48f702f-fcea-4776-a2c1-c4c613b2de79

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Submission to IEEE Access
                cs.SD

                Graphics & Multimedia design
                Graphics & Multimedia design

                Comments

                Comment on this article