A General Offline Reinforcement Learning Framework for Interactive Recommendation

Authors

  • Teng Xiao Machine Intelligence Lab (MiLAB), AI Division, School of Engineering, Westlake University
  • Donglin Wang Machine Intelligence Lab (MiLAB), AI Division, School of Engineering, Westlake University

Keywords:

Web Search & Information Retrieval, Learning Preferences or Rankings

Abstract

This paper studies the problem of learning interactive recommender systems from logged feedbacks without any exploration in online environments. We address the problem by proposing a general offline reinforcement learning framework for recommendation, which enables maximizing cumulative user rewards without online exploration. Specifically, we first introduce a probabilistic generative model for interactive recommendation, and then propose an effective inference algorithm for discrete and stochastic policy learning based on logged feedbacks. In order to perform offline learning more effectively, we propose five approaches to minimize the distribution mismatch between the logging policy and recommendation policy: support constraints, supervised regularization, policy constraints, dual constraints and reward extrapolation. We conduct extensive experiments on two public real-world datasets, demonstrating that the proposed methods can achieve superior performance over existing supervised learning and reinforcement learning methods for recommendation.

Downloads

Published

2021-05-18

How to Cite

Xiao, T., & Wang, D. (2021). A General Offline Reinforcement Learning Framework for Interactive Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(5), 4512-4520. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16579

Issue

Section

AAAI Technical Track on Data Mining and Knowledge Management