T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition

Kun Liu; Wu Liu; Chuang Gan; Mingkui Tan; Huadong Ma

doi:10.1609/aaai.v32i1.12333

Authors

Kun Liu Beijing University of Posts and Telecommunications
Wu Liu Beijing University of Posts and Telecommunications
Chuang Gan Tsinghua University
Mingkui Tan South China University of Technology
Huadong Ma Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v32i1.12333

Abstract

Video-based action recognition with deep neural networks has shown remarkable progress. However, most of the existing approaches are too computationally expensive due to the complex network architecture. To address these problems, we propose a new real-time action recognition architecture, called Temporal Convolutional 3D Network (T-C3D), which learns video action representations in a hierarchical multi-granularity manner. Specifically, we combine a residual 3D convolutional neural network which captures complementary information on the appearance of a single frame and the motion between consecutive frames with a new temporal encoding method to explore the temporal dynamics of the whole video. Thus heavy calculations are avoided when doing the inference, which enables the method to be capable of real-time processing. On two challenging benchmark datasets, UCF101 and HMDB51, our method is significantly better than state-of-the-art real-time methods by over 5.4% in terms of accuracy and 2 times faster in terms of inference speed (969 frames per second), demonstrating comparable recognition performance to the state-of-the-art methods. The source code for the complete system as well as the pre-trained models are publicly available at https://github.com/tc3d.

T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription