DBA: Dynamic Multi-Armed Bandit Algorithm

Sadegh Nobari

doi:10.1609/aaai.v33i01.33019869

Authors

Sadegh Nobari Rakuten, Inc.

DOI:

https://doi.org/10.1609/aaai.v33i01.33019869

Abstract

We introduce Dynamic Bandit Algorithm (DBA), a practical solution to improve the shortcoming of the pervasively employed reinforcement learning algorithm called Multi-Arm Bandit, aka Bandit. Bandit makes real-time decisions based on the prior observations. However, Bandit is heavily biased to the priors that it cannot quickly adapt itself to a trend that is interchanging. As a result, Bandit cannot, quickly enough, make profitable decisions when the trend is changing. Unlike Bandit, DBA focuses on quickly adapting itself to detect these trends early enough. Furthermore, DBA remains as almost as light as Bandit in terms of computations. Therefore, DBA can be easily deployed in production as a light process similar to The Bandit. We demonstrate how critical and beneficial is the main focus of DBA, i.e. the ability to quickly finding the most profitable option in real-time, over its stateof-the-art competitors. Our experiments are augmented with a visualization mechanism that explains the profitability of the decisions made by each algorithm in each step by animations. Finally we observe that DBA can substantially outperform the original Bandit by close to 3 times for a set Key Performance Indicator (KPI) in a case of having 3 arms.

DBA: Dynamic Multi-Armed Bandit Algorithm

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information