Improving Human-Object Interaction Detection via Phrase Learning and Label Composition

Authors

  • Zhimin Li Huazhong University of Science and Technology
  • Cheng Zou Megvii
  • Yu Zhao Megvii
  • Boxun Li Megvii
  • Sheng Zhong Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v36i2.20041

Keywords:

Computer Vision (CV)

Abstract

Human-Object Interaction (HOI) detection is a fundamental task in high-level human-centric scene understanding. We propose PhraseHOI, containing a HOI branch and a novel phrase branch, to leverage language prior and improve relation expression. Specifically, the phrase branch is supervised by semantic embeddings, whose ground truths are automatically converted from the original HOI annotations without extra human efforts. Meanwhile, a novel label composition method is proposed to deal with the long-tailed problem in HOI, which composites novel phrase labels by semantic neighbors. Further, to optimize the phrase branch, a loss composed of a distilling loss and a balanced triplet loss is proposed. Extensive experiments are conducted to prove the effectiveness of the proposed PhraseHOI, which achieves significant improvement over the baseline and surpasses previous state-of-the-art methods on Full and NonRare on the challenging HICO-DET benchmark.

Downloads

Published

2022-06-28

How to Cite

Li, Z., Zou, C., Zhao, Y., Li, B., & Zhong, S. (2022). Improving Human-Object Interaction Detection via Phrase Learning and Label Composition. Proceedings of the AAAI Conference on Artificial Intelligence, 36(2), 1509-1517. https://doi.org/10.1609/aaai.v36i2.20041

Issue

Section

AAAI Technical Track on Computer Vision II