Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, which are essential for achieving natural, human-like spoken conversation, especially when such information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT), an LLM utilizes text and speech modality to better model the linguistic content and paralinguistic attribute of spoken response. The model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking multi-modal framework. Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning. We utilize the Switchboard-1 corpus, including its sentiment labels to be the paralinguistic attribute, as our spoken dialogue dataset. Experimental results indicate the proposed serialized multitasking method outperforms typical sequence classification techniques on current and response sentiment classification. Furthermore, leveraging conversational context and speech embeddings significantly improves both response text generation and sentiment prediction. Our proposed framework achieves relative improvements of 6.7%, 12.0%, and 3.5% in current sentiment accuracy, response sentiment accuracy, and response text BLEU score, respectively.

Related collections

Author and article information

Journal

Publication date Created: 23 December 2023

Article

ArXiV ID: 2312.15316

SO-VID: 8b694291-c5d7-4d7c-a49c-71d12d6261c2

License:

http://creativecommons.org/licenses/by-sa/4.0/

History

Custom metadata

Comments Accepted by ICASSP 2024

Categories cs.CL eess.AS

ScienceOpen disciplines: Theoretical computer science,Electrical engineering

Data availability:

ScienceOpen disciplines: Theoretical computer science, Electrical engineering

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

Read this article at

Abstract

Related collections

Blockchain in Healthcare Today

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 252