VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning

Siran Chen; Boyu Chen; Yuxiao Luo; Chenyun Yu; Yi Ouyang; Lei Cheng; Chengxiang Zhuo; Zang Li; Yali Wang

doi:10.1609/aaai.v40i3.37152

Authors

Siran Chen University of Chinese Academy of Sciences Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Tencent
Boyu Chen University of Chinese Academy of Sciences Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Tencent
Yuxiao Luo Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Chenyun Yu Shenzhen Campus of Sun Yat-sen University
Yi Ouyang Tencent
Lei Cheng Tencent
Chengxiang Zhuo Tencent
Zang Li Tencent
Yali Wang Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shanghai Artificial Intelligence Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i3.37152

Abstract

Large language model (LLM) agents have emerged as a promising solution for enhancing recommendation systems via user simulation. However, existing studies predominantly resort to prompt-based simulation using frozen LLMs, which frequently results in suboptimal item modeling and user preference learning, thereby ultimately constraining recommendation performance. To address these challenges, we introduce VRAgent-R1, a novel agent-based paradigm that incorporates human-like intelligence in user simulation. Specifically, VRAgent-R1 comprises two distinct agents: the Item Perception (IP) Agent and the User Simulation (US) Agent, designed for interactive user-item modeling. Firstly, the IP Agent emulates human-like progressive thinking based on MLLMs, effectively capturing hidden recommendation semantics in videos. With a more comprehensive multimodal content understanding provided by the IP Agent, the video recommendation system is equipped to provide higher-quality candidate items. Subsequently, the US Agent refines the recommended video sets based on in-depth chain-of-thought (CoT) reasoning and achieves better alignment with real user preferences through reinforcement learning. Experimental results on a large-scale video recommendation benchmark MicroLens-100k have demonstrated the effectiveness of our proposed VRAgent-R1 method, e.g., the IP Agent achieves a 6.0% improvement in NDCG@10, while the US Agent shows approximately 45.0% higher accuracy in user decision simulation compared to state-of-the-art baselines.

VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information