58
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.

          Related collections

          Most cited references41

          • Record: found
          • Abstract: not found
          • Article: not found

          Multifrequency channel decompositions of images and wavelet models

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Texture analysis and classification with tree-structured wavelet transform.

            A multiresolution approach based on a modified wavelet transform called the tree-structured wavelet transform or wavelet packets is proposed. The development of this transform is motivated by the observation that a large class of natural textures can be modeled as quasi-periodic signals whose dominant frequencies are located in the middle frequency channels. With the transform, it is possible to zoom into any desired frequency channels for further decomposition. In contrast, the conventional pyramid-structured wavelet transform performs further decomposition in low-frequency channels. A progressive texture classification algorithm which is not only computationally attractive but also has excellent performance is developed. The performance of the present method is compared with that of several other methods.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Speech emotion recognition using hidden Markov models

                Bookmark

                Author and article information

                Journal
                Sensors (Basel)
                Sensors (Basel)
                Sensors (Basel, Switzerland)
                MDPI
                1424-8220
                January 2015
                14 January 2015
                : 15
                : 1
                : 1458-1478
                Affiliations
                Department of Information Technology & Communication, Shih Chien University, 200 University Road, Neimen, Kaohsiung 84550, Taiwan; E-Mail: kunching@ 123456mail.kh.usc.edu.tw ; Tel.: +886-76-678-888-5723; Fax: +886-76-678-888-4332
                Author notes

                Academic Editor: Vittorio M.N. Passaro

                Article
                sensors-15-01458
                10.3390/s150101458
                4327087
                25594590
                f875bee0-6640-405c-91fc-eb3c1910c27e
                © 2015 by the authors; licensee MDPI, Basel, Switzerland.

                This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 16 September 2014
                : 01 December 2014
                Categories
                Article

                Biomedical engineering
                multi-resolution,discrete wavelet transform,time-frequency texture,acoustic activity detection,spectrogram,laws masks

                Comments

                Comment on this article