Graph Attention Based Proposal 3D ConvNets for Action Detection

Jin Li; Xianglong Liu; Zhuofan Zong; Wanru Zhao; Mingyuan Zhang; Jingkuan Song

doi:10.1609/aaai.v34i04.5893

Authors

Jin Li Beihang University
Xianglong Liu Beihang University
Zhuofan Zong Beihang University
Wanru Zhao Beihang University
Mingyuan Zhang Beihang University
Jingkuan Song University of Electronic Science and Technology

DOI:

https://doi.org/10.1609/aaai.v34i04.5893

Abstract

The recent advances in 3D Convolutional Neural Networks (3D CNNs) have shown promising performance for untrimmed video action detection, employing the popular detection framework that heavily relies on the temporal action proposal generations as the input of the action detector and localization regressor. In practice the proposals usually contain strong intra and inter relations among them, mainly stemming from the temporal and spatial variations in the video actions. However, most of existing 3D CNNs ignore the relations and thus suffer from the redundant proposals degenerating the detection performance and efficiency. To address this problem, we propose graph attention based proposal 3D ConvNets (AGCN-P-3DCNNs) for video action detection. Specifically, our proposed graph attention is composed of intra attention based GCN and inter attention based GCN. We use intra attention to learn the intra long-range dependencies inside each action proposal and update node matrix of Intra Attention based GCN, and use inter attention to learn the inter dependencies between different action proposals as adjacency matrix of Inter Attention based GCN. Afterwards, we fuse intra and inter attention to model intra long-range dependencies and inter dependencies simultaneously. Another contribution is that we propose a simple and effective framewise classifier, which enhances the feature presentation capabilities of backbone model. Experiments on two proposal 3D ConvNets based models (P-C3D and P-ResNet) and two popular action detection benchmarks (THUMOS 2014, ActivityNet v1.3) demonstrate the state-of-the-art performance achieved by our method. Particularly, P-C3D embedded with our module achieves average mAP 3.7% improvement on THUMOS 2014 dataset compared to original model.

Graph Attention Based Proposal 3D ConvNets for Action Detection

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription