Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Yu-Heng Hung; Ping-Chun Hsieh; Xi Liu; P. R. Kumar

doi:10.1609/aaai.v35i9.16961

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Authors

Yu-Heng Hung National Chiao Tung University National Yang Ming Chiao Tung University
Ping-Chun Hsieh National Chiao Tung University National Yang Ming Chiao Tung University
Xi Liu Texas A&M University
P. R. Kumar Texas A&M University

DOI:

https://doi.org/10.1609/aaai.v35i9.16961

Keywords:

Online Learning & Bandits

Abstract

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems. We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with the state-of-the-art benchmark methods in extensive experiments. The new policies achieve this with low computation time per pull for linear bandits, and thereby resulting in both favorable regret as well as computational efficiency.

Downloads

Published

2021-05-18

How to Cite

Hung, Y.-H., Hsieh, P.-C., Liu, X., & Kumar, P. R. (2021). Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7874-7882. https://doi.org/10.1609/aaai.v35i9.16961

Download Citation

Issue

Vol. 35 No. 9: AAAI-21 Technical Tracks 9

Section

AAAI Technical Track on Machine Learning II

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription