On Interruptible Pure Exploration in Multi-Armed Bandits

Authors

  • Alexander Shleyfman Technion – Israel Institute of Technology
  • Antonín Komenda Czech Technical University in Prague
  • Carmel Domshlak Technion – Israel Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v29i1.9688

Keywords:

Multi-Armed Bandit, Monte-Carlo, Search

Abstract

Interruptible pure exploration in multi-armed bandits (MABs) is a key component of Monte-Carlo tree search algorithms for sequential decision problems. We introduce Discriminative Bucketing (DB), a novel family of strategies for pure exploration in MABs, which allows for adapting recent advances in non-interruptible strategies to the interruptible setting, while guaranteeing exponential-rate performance improvement over time. Our experimental evaluation demonstrates that the corresponding instances of DB favorably compete both with the currently popular strategies UCB1 and Epsilon-Greedy, as well as with the conservative uniform sampling.

Downloads

Published

2015-03-04

How to Cite

Shleyfman, A., Komenda, A., & Domshlak, C. (2015). On Interruptible Pure Exploration in Multi-Armed Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1). https://doi.org/10.1609/aaai.v29i1.9688

Issue

Section

AAAI Technical Track: Reasoning under Uncertainty