Semi-tied Units for Efficient Gating in LSTM and Highway Networks

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Gating is a key technique used for integrating information from multiple sources by long short-term memory (LSTM) models and has recently also been applied to other models such as the highway network. Although gating is powerful, it is rather expensive in terms of both computation and storage as each gating unit uses a separate full weight matrix. This issue can be severe since several gates can be used together in e.g. an LSTM cell. This paper proposes a semi-tied unit (STU) approach to solve this efficiency issue, which uses one shared weight matrix to replace those in all the units in the same layer. The approach is termed "semi-tied" since extra parameters are used to separately scale each of the shared output values. These extra scaling factors are associated with the network activation functions and result in the use of parameterised sigmoid, hyperbolic tangent, and rectified linear unit functions. Speech recognition experiments using British English multi-genre broadcast data showed that using STUs can reduce the calculation and storage cost by a factor of three for highway networks and four for LSTMs, while giving similar word error rates to the original models.

Related collections

Most cited references 9

Record: found
Abstract: not found
Article: not found

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

G E Dahl, Dong Yu, Li Deng … (2012)

0 comments Cited 310 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Finding consensus in speech recognition: word error minimization and other applications of confusion networks

A. Stolcke, E. Brill, L. Mangu (2000)

We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach the recognizer outputs the string of words corresponding to the path with the highest posterior probability given the acoustics and a language model. However, even given optimal models, the MAP decoder does not necessarily minimize the commonly used performance metric, word error rate (WER). We describe a method for explicitly minimizing WER by extracting word hypotheses with the highest posterior probabilities from word lattices. We change the standard problem formulation by replacing global search over a large set of sentence hypotheses with local search over a small set of word candidates. In addition to improving the accuracy of the recognizer, our method produces a new representation of the set of candidate hypotheses that specifies the sequence of word-level confusions in a compact lattice format. We study the properties of confusion networks and examine their use for other tasks, such as lattice compression, word spotting, confidence annotation, and reevaluation of recognition hypotheses using higher-level knowledge sources.

0 comments Cited 18 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation

Pawel Swietojanski, Jinyu Li, Steve Renals (2016)

This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data. We also extend LHUC to a speaker adaptive training (SAT) framework that leads to a more adaptable DNN acoustic model, working both in a speaker-dependent and a speaker-independent manner, without the requirements to maintain auxiliary speaker-dependent feature extractors or to introduce significant speaker-dependent changes to the DNN structure. Through a series of experiments on four different speech recognition benchmarks (TED talks, Switchboard, AMI meetings, and Aurora4) comprising 270 test speakers, we show that LHUC in both its test-only and SAT variants results in consistent word error rate reductions ranging from 5% to 23% relative depending on the task and the degree of mismatch between training and test data. In addition, we have investigated the effect of the amount of adaptation data per speaker, the quality of unsupervised adaptation targets, the complementarity to other adaptation techniques, one-shot adaptation, and an extension to adapting DNNs trained in a sequence discriminative manner.

0 comments Cited 11 times – based on 0 reviews

Preprint

     Review now

Bookmark

All references

Author and article information

Journal

Publication date Created: 18 June 2018

Article

ArXiV ID: 1806.06513

SO-VID: d5792f62-770f-494e-965d-a174708cfb8d

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments To appear in Proc. INTERSPEECH 2018, September 2-6, 2018, Hyderabad, India

Categories cs.CL cs.LG eess.AS stat.ML

ScienceOpen disciplines: Theoretical computer science,Machine learning,Artificial intelligence,Electrical engineering

Data availability:

ScienceOpen disciplines: Theoretical computer science, Machine learning, Artificial intelligence, Electrical engineering

Semi-tied Units for Efficient Gating in LSTM and Highway Networks

Read this article at

Abstract

Related collections

Semantic Knowledge Base

Most cited references 9

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

Finding consensus in speech recognition: word error minimization and other applications of confusion networks

Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 249

Most referenced authors 91