Speaker Attentive Speech Emotion Recognition

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Speech Emotion Recognition (SER) task has known significant improvements over the last years with the advent of Deep Neural Networks (DNNs). However, even the most successful methods are still rather failing when adaptation to specific speakers and scenarios is needed, inevitably leading to poorer performances when compared to humans. In this paper, we present novel work based on the idea of teaching the emotion recognition network about speaker identity. Our system is a combination of two ACRNN classifiers respectively dedicated to speaker and emotion recognition. The first informs the latter through a Self Speaker Attention (SSA) mechanism that is shown to considerably help to focus on emotional information of the speech signal. Experiments on social attitudes database Att-HACK and IEMOCAP corpus demonstrate the effectiveness of the proposed method and achieve the state-of-the-art performance in terms of unweighted average recall.

Related collections

Author and article information

Journal

Publication date Created: 15 April 2021

Article

ArXiV ID: 2104.07288

SO-VID: 1579773f-5a5d-499f-ac0a-1ff094cfe2f9

License:

http://creativecommons.org/licenses/by-nc-sa/4.0/

History

Custom metadata

Categories eess.AS cs.LG cs.SD

ScienceOpen disciplines: Artificial intelligence,Graphics & Multimedia design,Electrical engineering

Data availability:

ScienceOpen disciplines: Artificial intelligence, Graphics & Multimedia design, Electrical engineering

Speaker Attentive Speech Emotion Recognition

Read this article at

Abstract

Related collections

Laboratory Phonology

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 278