Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility

Shanshan Zhao; Wenhai Cui; Bei Jiang; Linglong Kong; Xiaodong Yan

doi:10.1609/aaai.v38i19.30182

Authors

Shanshan Zhao Mathematics Discipline, Shandong University
Wenhai Cui Mathematics Discipline, Shandong University
Bei Jiang Mathematics Discipline, University of Alberta
Linglong Kong Mathematics Discipline, University of Alberta
Xiaodong Yan Mathematics Discipline, Shandong University Shandong National Center for Applied Mathematics

DOI:

https://doi.org/10.1609/aaai.v38i19.30182

Keywords:

General

Abstract

For ensuring the safety of users by protecting the privacy, the traditional privacy-preserving bandit algorithm aiming to maximize the mean reward has been widely studied in scenarios such as online ride-hailing, advertising recommendations, and personalized healthcare. However, classical bandit learning is irresponsible in such practical applications as they fail to account for risks in online decision-making and ignore external system information. This paper firstly proposes privacy protected mean-volatility utility as the objective of bandit learning and proves its responsibility, because it aims at achieving the maximum probability of utility by considering the risk. Theoretically, our proposed responsible bandit learning is expected to achieve the fastest convergence rate among current bandit algorithms and generates more statistical power than classical normality-based test. Finally, simulation studies provide supporting evidence for the theoretical results and demonstrate stronger performance when using stricter privacy budgets.

Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information