Regret Bounds for Batched Bandits

Hossein Esfandiari; Amin Karbasi; Abbas Mehrabian; Vahab Mirrokni

doi:10.1609/aaai.v35i8.16901

Regret Bounds for Batched Bandits

Authors

Hossein Esfandiari Google Research, New York
Amin Karbasi Yale University
Abbas Mehrabian McGill University
Vahab Mirrokni Google Research, New York

DOI:

https://doi.org/10.1609/aaai.v35i8.16901

Keywords:

Online Learning & Bandits, Learning Theory

Abstract

We present simple algorithms for batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and provide the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.

Downloads

Published

2021-05-18

How to Cite

Esfandiari, H., Karbasi, A., Mehrabian, A., & Mirrokni, V. (2021). Regret Bounds for Batched Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8), 7340-7348. https://doi.org/10.1609/aaai.v35i8.16901

Download Citation

Issue

Vol. 35 No. 8: AAAI-21 Technical Tracks 8

Section

AAAI Technical Track on Machine Learning I

Regret Bounds for Batched Bandits

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription