HARK: Hierarchical Agentic Retrieval with Keyframing for Video Understanding (Student Abstract)
DOI:
https://doi.org/10.1609/aaai.v40i48.42237Abstract
Current video understanding models struggle with temporal reasoning and efficient processing while balancing detail preservation with computational efficiency. We propose a hierarchical memory system that segments videos into action and scene units, combined with question-aware agentic keyframe selection. Our method achieves 70.3% overall accuracy on VideoMME short video benchmarks.Downloads
Published
2026-03-14
How to Cite
Li, J., Qiao, Y., & Huang, S. (2026). HARK: Hierarchical Agentic Retrieval with Keyframing for Video Understanding (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41266–41268. https://doi.org/10.1609/aaai.v40i48.42237
Issue
Section
AAAI Student Abstract and Poster Program