TA2N: Two-Stage Action Alignment Network for Few-Shot Action Recognition

Shuyuan Li; Huabin Liu; Rui Qian; Yuxi Li; John See; Mengjuan Fei; Xiaoyuan Yu; Weiyao Lin

doi:10.1609/aaai.v36i2.20029

Authors

Shuyuan Li Shanghai Jiao Tong University
Huabin Liu Shanghai Jiao Tong University
Rui Qian Shanghai Jiao Tong University
Yuxi Li Shanghai Jiao Tong University
John See Heriot-Watt University Malaysia
Mengjuan Fei Huawei Cloud
Xiaoyuan Yu Huawei Cloud
Weiyao Lin Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v36i2.20029

Keywords:

Computer Vision (CV)

Abstract

Few-shot action recognition aims to recognize novel action classes (query) using just a few samples (support). The majority of current approaches follow the metric learning paradigm, which learns to compare the similarity between videos. Recently, it has been observed that directly measuring this similarity is not ideal since different action instances may show distinctive temporal distribution, resulting in severe misalignment issues across query and support videos. In this paper, we arrest this problem from two distinct aspects -- action duration misalignment and action evolution misalignment. We address them sequentially through a Two-stage Action Alignment Network (TA2N). The first stage locates the action by learning a temporal affine transform, which warps each video feature to its action duration while dismissing the action-irrelevant feature (e.g. background). Next, the second stage coordinates query feature to match the spatial-temporal action evolution of support by performing temporally rearrange and spatially offset prediction. Extensive experiments on benchmark datasets show the potential of the proposed method in achieving state-of-the-art performance for few-shot action recognition.

TA2N: Two-Stage Action Alignment Network for Few-Shot Action Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription