End-to-End Neural Transformer Based Spoken Language Understanding

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Spoken language understanding (SLU) refers to the process of inferring the semantic information from audio signals. While the neural transformers consistently deliver the best performance among the state-of-the-art neural architectures in field of natural language processing (NLP), their merits in a closely related field, i.e., spoken language understanding (SLU) have not beed investigated. In this paper, we introduce an end-to-end neural transformer-based SLU model that can predict the variable-length domain, intent, and slots vectors embedded in an audio signal with no intermediate token prediction architecture. This new architecture leverages the self-attention mechanism by which the audio signal is transformed to various sub-subspaces allowing to extract the semantic context implied by an utterance. Our end-to-end transformer SLU predicts the domains, intents and slots in the Fluent Speech Commands dataset with accuracy equal to 98.1 \%, 99.6 \%, and 99.6 \%, respectively and outperforms the SLU models that leverage a combination of recurrent and convolutional neural networks by 1.4 \% while the size of our model is 25\% smaller than that of these architectures. Additionally, due to independent sub-space projections in the self-attention layer, the model is highly parallelizable which makes it a good candidate for on-device SLU.

Related collections

Author and article information

Journal

Publication date Created: 12 August 2020

Article

ArXiV ID: 2008.10984

SO-VID: 3c625ccf-c37f-4f46-a168-4c1a9bd90ad2

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments Interspeech 2020

Categories cs.CL cs.SD eess.AS

ScienceOpen disciplines: Theoretical computer science,Electrical engineering,Graphics & Multimedia design

Data availability:

ScienceOpen disciplines: Theoretical computer science, Electrical engineering, Graphics & Multimedia design

End-to-End Neural Transformer Based Spoken Language Understanding

Read this article at

Abstract

Related collections

Model Reduction of Parametrized Systems 2015

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 101