17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest

      research-article
      1 , * , , 2
      PLOS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Speech is a direct and rich way of transmitting information and emotions from one point to another. In this study, we aimed to classify different emotions in speech using various audio features and machine learning models. We extracted various types of audio features such as Mel-frequency cepstral coefficients, chromogram, Mel-scale spectrogram, spectral contrast feature, Tonnetz representation and zero-crossing rate. We used a limited dataset of speech emotion recognition (SER) and augmented it with additional audios. In addition, In contrast to many previous studies, we combined all audio files together before conducting our analysis. We compared the performance of two models: one-dimensional convolutional neural network (conv1D) and random forest (RF), with RF-based feature selection. Our results showed that RF with feature selection achieved higher average accuracy (69%) than conv1D and had the highest precision for fear (72%) and the highest recall for calm (84%). Our study demonstrates the effectiveness of RF with feature selection for speech emotion classification using a limited dataset. We found for both algorithms, anger is misclassified mostly with happy, disgust with sad and neutral, and fear with sad. This could be due to the similarity of some acoustic features between these emotions, such as pitch, intensity, and tempo.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: not found
          • Article: not found

          Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English

            The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Content analysis for audio classification and segmentation

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curation
                Role: Data curation
                Role: Editor
                Journal
                PLoS One
                PLoS One
                plos
                PLOS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                21 November 2023
                2023
                : 18
                : 11
                : e0291500
                Affiliations
                [1 ] Independent Researcher, Mashhad, Iran
                [2 ] Ghana Data Center, Accra, Ghana
                Valahia University of Targoviste: Universitatea Valahia din Targoviste, ROMANIA
                Author notes

                Competing Interests: There is no competing interest

                Author information
                https://orcid.org/0009-0003-3467-6295
                Article
                PONE-D-23-20208
                10.1371/journal.pone.0291500
                10662716
                37988352
                9b11a4e3-2bea-473d-9a83-7ffe8af43b31
                © 2023 Rezapour Mashhadi, Osei-Bonsu

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 30 June 2023
                : 31 August 2023
                Page count
                Figures: 3, Tables: 2, Pages: 13
                Funding
                The authors received no specific funding for this work.
                Categories
                Research Article
                Biology and Life Sciences
                Psychology
                Emotions
                Social Sciences
                Psychology
                Emotions
                Engineering and Technology
                Signal Processing
                Audio Signal Processing
                Engineering and Technology
                Signal Processing
                Speech Signal Processing
                Social Sciences
                Linguistics
                Speech
                Physical Sciences
                Physics
                Acoustics
                Acoustic Signals
                Biology and Life Sciences
                Psychology
                Emotions
                Fear
                Social Sciences
                Psychology
                Emotions
                Fear
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognition
                Memory
                Memory Recall
                Biology and Life Sciences
                Neuroscience
                Learning and Memory
                Memory
                Memory Recall
                Engineering and Technology
                Signal Processing
                Signal Filtering
                Custom metadata
                Various data types were used and all are publicly available based on the data section.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article