Transferable Contextual Bandit for Cross-Domain Recommendation

Authors

  • Bo Liu The Hong Kong University of Science and Technology
  • Ying Wei The Hong Kong University of Science and Technology
  • Yu Zhang The Hong Kong University of Science and Technology
  • Zhixian Yan Cheetah Mobile USA
  • Qiang Yang The Hong Kong University of Science and Technology

Keywords:

Transfer Learning, Multi-Armed Bandit, Recommender System, Reinforcement Learning

Abstract

Traditional recommendation systems (RecSys) suffer from two problems: the exploitation-exploration dilemma and the cold-start problem. One solution to solving the exploitation-exploration dilemma is the contextual bandit policy, which adaptively exploits and explores user interests. As a result, the contextual bandit policy achieves increased rewards in the long run. The contextual bandit policy, however, may cause the system to explore more than needed in the cold-start situations, which can lead to worse short-term rewards. Cross-domain RecSys methods adopt transfer learning to leverage prior knowledge in a source RecSys domain to jump start the cold-start target RecSys. To solve the two problems together, in this paper, we propose the first applicable transferable contextual bandit (TCB) policy for the cross-domain recommendation. TCB not only benefits the exploitation but also accelerates the exploration in the target RecSys. TCB's exploration, in turn, helps to learn how to transfer between different domains. TCB is a general algorithm for both homogeneous and heterogeneous domains. We perform both theoretical regret analysis and empirical experiments. The empirical results show that TCB outperforms the state-of-the-art algorithms over time.

Downloads

Published

2018-04-29

How to Cite

Liu, B., Wei, Y., Zhang, Y., Yan, Z., & Yang, Q. (2018). Transferable Contextual Bandit for Cross-Domain Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/11699