Unsupervised Learning of Video Representations using LSTMs

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different tasks, such as reconstructing the input sequence, or predicting the future sequence. We experiment with two kinds of input sequences - patches of image pixels and high-level representations ("percepts") of video frames extracted using a pretrained convolutional net. We explore different design choices such as whether the decoder LSTMs should condition on the generated output. We analyze the outputs of the model qualitatively to see how well the model can extrapolate the learned video representation into the future and into the past. We try to visualize and interpret the learned features. We stress test the model by running it on longer time scales and on out-of-domain data. We further evaluate the representations by finetuning them for a supervised learning problem - human action recognition on the UCF-101 and HMDB-51 datasets. We show that the representations help improve classification accuracy, especially when there are only a few training examples. Even models pretrained on unrelated datasets (300 hours of YouTube videos) can help action recognition performance.

Related collections

Author and article information

Journal

Publication date Created: 2015-02-16

Publication date Updated: 2016-01-03

Article

ArXiV ID: 1502.04681

SO-VID: 16a7dc86-9bd7-460a-86dc-855da8bb7390

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments Added link to code on github

Categories cs.LG cs.CV cs.NE

ScienceOpen disciplines: Computer vision & Pattern recognition,Neural & Evolutionary computing,Artificial intelligence

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition, Neural & Evolutionary computing, Artificial intelligence

Unsupervised Learning of Video Representations using LSTMs

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 485