Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

Kai Wang; Zhene Zou; Qilin Deng; Jianrong Tao; Runze Wu; Changjie Fan; Liang Chen; Peng Cui

doi:10.1609/aaai.v35i5.16569

Authors

Kai Wang NetEase Fuxi AI Lab
Zhene Zou NetEase Fuxi AI Lab
Qilin Deng NetEase Fuxi AI Lab
Jianrong Tao NetEase Fuxi AI Lab
Runze Wu NetEase Fuxi AI Lab
Changjie Fan NetEase Fuxi AI Lab
Liang Chen Sun Yat-sen University
Peng Cui Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v35i5.16569

Keywords:

Recommender Systems & Collaborative Filtering

Abstract

In recent years, there are great interests as well as many challenges in applying reinforcement learning (RL) to recommendation systems (RS). In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive state and action spaces, high-variance environment, and the unspecific reward setting in recommendation. All these problems remain largely unexplored in the existing literature and make the application of RL challenging. We develop a model-based reinforcement learning framework, called GoalRec. Inspired by the ideas of world model (model-based), value function estimation (model-free), and goal-based RL, a novel disentangled universal value function designed for item recommendation is proposed. It can generalize to various goals that the recommender may have, and disentangle the stochastic environmental dynamics and high-variance reward signals accordingly. As a part of the value function, free from the sparse and high-variance reward signals, a high-capacity reward-independent world model is trained to simulate complex environmental dynamics under a certain goal. Based on the predicted environmental dynamics, the disentangled universal value function is related to the user's future trajectory instead of a monolithic state and a scalar reward. We demonstrate the superiority of GoalRec over previous approaches in terms of the above three practical challenges in a series of simulations and a real application.

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription