Orchestrating the Symphony of Prompt Distribution Learning for Human-Object Interaction Detection

Authors

  • Mingda Jia Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University
  • Liming Zhao Alibaba Group
  • Ge Li Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University
  • Yun Zheng Alibaba Group

DOI:

https://doi.org/10.1609/aaai.v39i4.32412

Abstract

Human-object interaction (HOI) detectors with popular query-transformer architecture have achieved promising performance. However, accurately identifying uncommon visual patterns and distinguishing between ambiguous HOIs continue to be difficult for them. We observe that these difficulties may arise from the limited capacity of traditional detector queries to represent diverse intra-category patterns and inter-category dependencies. To address this, we introduce the Interaction Prompt Distribution Learning (InterProDa) approach. InterProDa learns multiple sets of soft prompts and estimates category distributions from various prompts. It then incorporates HOI queries with category distributions, making them capable of representing near-infinite intra-category dynamics and universal cross-category relationships. Our InterProDa detector demonstrates competitive performance on HICO-DET and vcoco benchmarks. Additionally, our method can be integrated into most transformer-based HOI detectors, significantly enhancing their performance with minimal additional parameters.

Downloads

Published

2025-04-11

How to Cite

Jia, M., Zhao, L., Li, G., & Zheng, Y. (2025). Orchestrating the Symphony of Prompt Distribution Learning for Human-Object Interaction Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 39(4), 3940-3948. https://doi.org/10.1609/aaai.v39i4.32412

Issue

Section

AAAI Technical Track on Computer Vision III