Action-and-object Aware Alignment for Partially Relevant Video Retrieval

Chuanshen Chen; Kai Zhou; Zhiquan Wen; Zeng You; Yirui Li; Tianhang Xiang; Mingkui Tan

doi:10.1609/aaai.v40i4.37271

Authors

Chuanshen Chen South China University of Technology Peng Cheng Laboratory
Kai Zhou South China University of Technology
Zhiquan Wen South China University of Technology
Zeng You South China University of Technology Peng Cheng Laboratory
Yirui Li South China University of Technology
Tianhang Xiang South China University of Technology
Mingkui Tan South China University of Technology Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i4.37271

Abstract

Partially Relevant Video Retrieval (PRVR) aims to retrieve untrimmed videos containing relevant moments for a given text query. This task is extremely challenging, as untrimmed videos often include numerous actions and objects unrelated to the query. However, existing methods usually struggle with fine-grained action-object modeling, limiting their retrieval performance. To tackle this challenge, we introduce Action-and-object Aware Alignment for Partially Relevant Video Retrieval (A3PRVR), a dual-branch framework designed to enhance retrieval by improving the modeling of action-object relationships. Specifically, we propose a Query-specific Deformable Temporal Attention (Q-DTA) module to effectively capture action-relevant object information in video features, while filtering out irrelevant content. Additionally, we propose an action-and-object aware alignment module to enable fine-grained textual understanding and video-text alignment. It uses action- and object-aware contrastive losses to enhance the model's sensitivity to action-object distinctions in the text query. Compared to state-of-the-art methods, A3PRVR achieves an average relative gain of 6.5% in SumR across the Charades-STA, ActivityNet-Caption, and TVR datasets.

Action-and-object Aware Alignment for Partially Relevant Video Retrieval

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information