Factorization Bandits for Interactive Recommendation

Huazheng Wang; Qingyun Wu; Hongning Wang

doi:10.1609/aaai.v31i1.10936

Authors

Huazheng Wang University of Virginia
Qingyun Wu University of Virginia
Hongning Wang University of Virginia

DOI:

https://doi.org/10.1609/aaai.v31i1.10936

Keywords:

Contextual bandits, latent feature learning, online recommendations, regret analysis

Abstract

We perform online interactive recommendation via a factorization-based bandit algorithm. Low-rank matrix completion is performed over an incrementally constructed user-item preference matrix, where an upper confidence bound based item selection strategy is developed to balance the exploit/explore trade-off during online learning. Observable contextual features and dependency among users (e.g., social influence) are leveraged to improve the algorithm's convergence rate and help conquer cold-start in recommendation. A high probability sublinear upper regret bound is proved for the developed algorithm, where considerable regret reduction is achieved on both user and item sides. Extensive experimentations on both simulations and large-scale real-world datasets confirmed the advantages of the proposed algorithm compared with several state-of-the-art factorization-based and bandit-based collaborative filtering methods.

Factorization Bandits for Interactive Recommendation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription