Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Weibo Jiang; Weihong Ren; Jiandong Tian; Liangqiong Qu; Zhiyong Wang; Honghai Liu

doi:10.1609/aaai.v38i3.28031

Authors

Weibo Jiang State Key Laboratory of Robotics and System, School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen
Weihong Ren State Key Laboratory of Robotics and System, School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen
Jiandong Tian State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Science
Liangqiong Qu Department of Statistics and Actuarial Science, The University of Hong Kong
Zhiyong Wang State Key Laboratory of Robotics and System, School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen
Honghai Liu State Key Laboratory of Robotics and System, School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen

DOI:

https://doi.org/10.1609/aaai.v38i3.28031

Keywords:

CV: Scene Analysis & Understanding, CV: Video Understanding & Activity Analysis

Abstract

Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of . Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregation, but ignore the potential cross-triplet dependencies, resulting in ambiguity of action prediction. In this work, we propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI detection. Specifically, we regard each triplet proposal as a graph where Human, Object represent nodes and Action indicates edge, to aggregate self-triplet correlation. Also, we try to explore cross-triplet dependencies by jointly considering instance-level, semantic-level, and layout-level relations. Besides, we leverage the CLIP model to assist our SCTC obtain interaction-aware feature by knowledge distillation, which provides useful action clues for HOI detection. Extensive experiments on HICO-DET and V-COCO datasets verify the effectiveness of our proposed SCTC.

Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription