A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We present a deep learning-based multitask framework for joint 3D human pose estimation and action recognition from RGB sensors using simple cameras. The approach proceeds along two stages. In the first, a real-time 2D pose detector is run to determine the precise pixel location of important keypoints of the human body. A two-stream deep neural network is then designed and trained to map detected 2D keypoints into 3D poses. In the second stage, the Efficient Neural Architecture Search (ENAS) algorithm is deployed to find an optimal network architecture that is used for modeling the spatio-temporal evolution of the estimated 3D poses via an image-based intermediate representation and performing action recognition. Experiments on Human3.6M, MSR Action3D and SBU Kinect Interaction datasets verify the effectiveness of the proposed method on the targeted tasks. Moreover, we show that the method requires a low computational budget for training and inference. In particular, the experimental results show that by using a monocular RGB sensor, we can develop a 3D pose estimation and human action recognition approach that reaches the performance of RGB-depth sensors. This opens up many opportunities for leveraging RGB cameras (which are much cheaper than depth cameras and extensively deployed in private and public places) to build intelligent recognition systems.

Related collections

Most cited references 73

Record: found
Abstract: found
Article: found

Is Open Access

Adam: A Method for Stochastic Optimization

, (2015)

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

0 comments Cited 442 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

Catalin Ionescu, Cristian Sminchisescu, Vlad Olaru … (2014)

0 comments Cited 271 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Microsoft Kinect Sensor and Its Effect

Zhengyou Zhang (2012)

0 comments Cited 259 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Sensors (Basel)

Journal ID (iso-abbrev): Sensors (Basel)

Journal ID (publisher-id): sensors

Title: Sensors (Basel, Switzerland)

Publisher: MDPI

ISSN (Electronic): 1424-8220

Publication date (Electronic): 25 March 2020

Publication date Collection: April 2020

Volume: 20

Issue: 7

Electronic Location Identifier: 1825

Affiliations

[1 ]Cerema Research Center, 31400 Toulouse, France; hieuhuy01@ 123456gmail.com (H.H.P.); louahdi.khoudour@ 123456cerema.fr (L.K.)

[2 ]Informatics Research Institute of Toulouse (IRIT), Université de Toulouse, CNRS, 31062 Toulouse, France; alain.crouzil@ 123456irit.fr

[3 ]Vingroup Big Data Institute (VinBDI), Hanoi 10000, Vietnam

[4 ]Clay AIR, Software Solution, 33000 Bordeaux, France; psalmane@ 123456clayair.io

[5 ]School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK

[6 ]Zebra Technologies Corp., London SE1 9LQ, UK

[7 ]Department of Computer Science and Engineering, University Carlos III de Madrid, 28270 Colmenarejo, Spain

[8 ]Aparnix, Santiago 7550076, Chile; pablozegers@ 123456gmail.com

Author notes

[* ]Correspondence: sergio.velastin@ 123456ieee.org

Author information

Huy Hieu Pham https://orcid.org/0000-0003-4851-2518

Houssam Salmane https://orcid.org/0000-0002-0919-7482

Alain Crouzil https://orcid.org/0000-0001-7040-2978

Sergio A. Velastin https://orcid.org/0000-0001-6775-1737

Pablo Zegers https://orcid.org/0000-0003-3697-2525

Article

Publisher ID: sensors-20-01825

DOI: 10.3390/s20071825

PMC ID: 7180926

PubMed ID: 32218350

SO-VID: 4e8d2134-841e-4990-bc97-618fc8f144af

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera

Read this article at

Abstract

Related collections

Computer Vision, Deep Learning, Deep Reinforcement Learning, IoT

Most cited references 73

Adam: A Method for Stochastic Optimization

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

Microsoft Kinect Sensor and Its Effect

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 60

Cited by 7

Most referenced authors 636