ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

43

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Preprint

Author(s): Joao Carreira , Andrew Zisserman

Publication date Created: 2017-05-22

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks. This paper re-evaluates state-of-the-art architectures in light of the new Kinetics Human Action Video dataset. Kinetics has two orders of magnitude more data, with 400 human action classes and over 400 clips per class, and is collected from realistic, challenging YouTube videos. We provide an analysis on how current architectures fare on the task of action classification on this dataset and how much performance improves on the smaller benchmark datasets after pre-training on Kinetics. We also introduce a new Two-Stream Inflated 3D ConvNet (I3D) that is based on 2D ConvNet inflation: filters and pooling kernels of very deep image classification ConvNets are expanded into 3D, making it possible to learn seamless spatio-temporal feature extractors from video while leveraging successful ImageNet architecture designs and even their parameters. We show that, after pre-training on Kinetics, I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.7% on HMDB-51 and 98.0% on UCF-101.

Related collections

Most cited references 10

Record: found
Abstract: not found
Conference Proceedings: not found

HMDB: A large video database for human motion recognition

H Kuehne, H. Jhuang, E Garrote … (2011)

0 comments Cited 275 times – based on 0 reviews

Record: found
Abstract: not found
Conference Proceedings: not found

Learning realistic human actions from movies

Ivan Laptev, Marcin Marszalek, Cordelia Schmid … (2008)

0 comments Cited 254 times – based on 0 reviews

Record: found
Abstract: not found
Conference Proceedings: not found

Large-Scale Video Classification with Convolutional Neural Networks

Andrej Karpathy, George Toderici, Sanketh Shetty … (2014)

0 comments Cited 207 times – based on 0 reviews

Author and article information

Journal

Publication date Created: 2017-05-22

Article

ArXiV ID: 1705.07750

SO-VID: 7d025fb1-a552-4bf1-85ef-9ffe870a723d

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments To appear at CVPR 2017

Categories cs.CV cs.LG

ScienceOpen disciplines: Computer vision & Pattern recognition,Artificial intelligence

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition, Artificial intelligence

Comments

Comment on this article

Similar content 122

See all similar

Cited by 57

Multi-Domain and Multi-Task Learning for Human Action Recognition
Authors: An-An Liu, Ning Xu, Wei-Zhi Nie …
Group Normalization
Authors: Kaiming He, Yuxin Wu
Motion Feature Network: Fixed Motion Filter for Action Recognition
Authors: Myunggi Lee, Seungeui Lee, Nojun Kwak …

See all cited by

Most referenced authors 459

See all reference authors