Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation

Authors

  • Yingfei Wang Princeton University
  • Hua Ouyang Apple Inc.
  • Chu Wang Nokia Bell Labs
  • Jianhui Chen Yahoo Research
  • Tsvetan Asamov Princeton University
  • Yi Chang Huawei Research America

DOI:

https://doi.org/10.1609/aaai.v31i1.10939

Keywords:

multi-armed bandits, Thompson sampling, Whole-page Recommendation, combinatorial optimization, semi-bandits

Abstract

Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the results in S (out of M) different positions. The aim is to maximize the cumulative reward with respect to the best possible subset and positions in hindsight. By the adaptation of a minimum-cost maximum-flow network, a practical algorithm based on Thompson sampling is derived for the (contextual) combinatorial problem, thus resolving the problem of computational intractability.With its potential to work with whole-page recommendation and any probabilistic models, to illustrate the effectiveness of our method, we focus on Gaussian process optimization and a contextual setting where click-through rate is predicted using logistic regression. We demonstrate the algorithms’ performance on synthetic Gaussian process problems and on large-scale news article recommendation datasets from Yahoo! Front Page Today Module.

Downloads

Published

2017-02-13

How to Cite

Wang, Y., Ouyang, H., Wang, C., Chen, J., Asamov, T., & Chang, Y. (2017). Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10939