Dual-Prior Augmented Decoding Network for Long Tail Distribution in HOI Detection

Authors

  • Jiayi Gao Beijing University of Posts and Telecommunications
  • Kongming Liang Beijing University of Posts and Telecommunications
  • Tao Wei Li Auto
  • Wei Chen Li Auto
  • Zhanyu Ma Beijing University of Posts and Telecommunications
  • Jun Guo Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v38i3.27949

Keywords:

CV: Object Detection & Categorization, CV: Language and Vision

Abstract

Human object interaction detection aims at localizing human-object pairs and recognizing their interactions. Trapped by the long-tailed distribution of the data, existing HOI detection methods often have difficulty recognizing the tail categories. Many approaches try to improve the recognition of HOI tasks by utilizing external knowledge (e.g. pre-trained visual-language models). However, these approaches mainly utilize external knowledge at the HOI combination level and achieve limited improvement in the tail categories. In this paper, we propose a dual-prior augmented decoding network by decomposing the HOI task into two sub-tasks: human-object pair detection and interaction recognition. For each subtask, we leverage external knowledge to enhance the model's ability at a finer granularity. Specifically, we acquire the prior candidates from an external classifier and embed them to assist the subsequent decoding process. Thus, the long-tail problem is mitigated from a coarse-to-fine level with the corresponding external knowledge. Our approach outperforms existing state-of-the-art models in various settings and significantly boosts the performance on the tail HOI categories. The source code is available at https://github.com/PRIS-CV/DP-ADN.

Published

2024-03-24

How to Cite

Gao, J., Liang, K., Wei, T., Chen, W., Ma, Z., & Guo, J. (2024). Dual-Prior Augmented Decoding Network for Long Tail Distribution in HOI Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 1806-1814. https://doi.org/10.1609/aaai.v38i3.27949

Issue

Section

AAAI Technical Track on Computer Vision II