Machine Learning on Sequential Data Using a Recurrent Weighted Average

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Recurrent Neural Networks (RNN) are a type of statistical model designed to handle sequential data. The model reads a sequence one symbol at a time. Each symbol is processed based on information collected from the previous symbols. With existing RNN architectures, each symbol is processed using only information from the previous processing step. To overcome this limitation, we propose a new kind of RNN model that computes a recurrent weighted average (RWA) over every past processing step. Because the RWA can be computed as a running average, the computational overhead scales like that of any other RNN. The approach essentially reformulates the attention mechanism into a stand-alone model. When assessing a RWA model, it is found to train faster and generalize better than a standard LSTM model when performing the variable copy problem, the adding problem, classification of artificial grammar, classification of sequences by length, and classification of MNIST handwritten digits (where the pixels are read sequentially one at a time).

Related collections

Most cited references 2

Record: found
Abstract: not found
Conference Proceedings: not found

Show and tell: A neural image caption generator

Oriol Vinyals, Alexander Toshev, Samy Bengio … (2015)

0 comments Cited 545 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Convolutional LSTM Networks for Subcellular Localization of Proteins

Søren Sønderby, Casper Sønderby, Henrik Nielsen … (2015)

Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks.

0 comments Cited 8 times – based on 0 reviews

Preprint

     Review now

Bookmark

All references

Author and article information

Journal

Publication date Created: 2017-03-03

Article

ArXiV ID: 1703.01253

SO-VID: e4476ad6-591d-4110-9ae5-62c6e0b555c3

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Categories stat.ML cs.LG

ScienceOpen disciplines: Machine learning,Artificial intelligence

Data availability:

ScienceOpen disciplines: Machine learning, Artificial intelligence

Machine Learning on Sequential Data Using a Recurrent Weighted Average

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 2

Show and tell: A neural image caption generator

Convolutional LSTM Networks for Subcellular Localization of Proteins

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 155

Most referenced authors 61