ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

5

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

An Online Attention-based Model for Speech Recognition

Preprint

Author(s): Ruchao Fan , Pan Zhou , Wei Chen , Jia Jia , Gang Liu

Publication date Created: 13 November 2018

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Attention-based end-to-end (E2E) speech recognition models such as Listen, Attend, and Spell (LAS) can achieve better results than traditional automatic speech recognition (ASR) hybrid models on LVCSR tasks. LAS combines acoustic, pronunciation and language model components of a traditional ASR system into a single neural network. However, such architectures are hard to be used for streaming speech recognition for its bidirectional listener architecture and attention mechanism. In this work, we propose to use latency-controlled bidirectional long short-term memory (LC- BLSTM) listener to reduce the delay of forward computing of listener. On the attention side, we propose an adaptive monotonic chunk-wise attention (AMoChA) to make LAS online. We explore how each part performs when it is used alone and obtain comparable or better results than LAS baseline. By combining the above two methods, we successfully stream LAS baseline with only 3.5% relative degradation of character error rate (CER) on our Mandarin corpus. We believe that our methods can also have the same effect on other languages.

Related collections

Most cited references 4

Record: found
Abstract: not found
Conference Proceedings: not found

Highway long short-term memory RNNS for distant speech recognition

James Glass, Sanjeev Khudanpur, Kaisheng Yaco … (2016)

0 comments Cited 35 times – based on 0 reviews

Record: found
Abstract: not found
Conference Proceedings: not found

Lower Frame Rate Neural Network Acoustic Models

Golan Pundak, Tara Sainath (2016)

0 comments Cited 23 times – based on 0 reviews

Record: found
Abstract: not found
Conference Proceedings: not found

Towards Better Decoding and Language Model Integration in Sequence to Sequence Models

Jan Chorowski, Navdeep Jaitly (2017)

0 comments Cited 17 times – based on 0 reviews

Author and article information

Journal

Publication date Created: 13 November 2018

Article

ArXiV ID: 1811.05247

SO-VID: 7ef6fa4d-b05a-4ecc-9305-d6bfe6f1f280

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.CL cs.LG cs.SD eess.AS

ScienceOpen disciplines: Theoretical computer science,Artificial intelligence,Electrical engineering,Graphics & Multimedia design

Data availability:

ScienceOpen disciplines: Theoretical computer science, Artificial intelligence, Electrical engineering, Graphics & Multimedia design

Comments

Comment on this article