Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility

Authors

  • Shanshan Zhao Mathematics Discipline, Shandong University
  • Wenhai Cui Mathematics Discipline, Shandong University
  • Bei Jiang Mathematics Discipline, University of Alberta
  • Linglong Kong Mathematics Discipline, University of Alberta
  • Xiaodong Yan Mathematics Discipline, Shandong University Shandong National Center for Applied Mathematics

DOI:

https://doi.org/10.1609/aaai.v38i19.30182

Keywords:

General

Abstract

For ensuring the safety of users by protecting the privacy, the traditional privacy-preserving bandit algorithm aiming to maximize the mean reward has been widely studied in scenarios such as online ride-hailing, advertising recommendations, and personalized healthcare. However, classical bandit learning is irresponsible in such practical applications as they fail to account for risks in online decision-making and ignore external system information. This paper firstly proposes privacy protected mean-volatility utility as the objective of bandit learning and proves its responsibility, because it aims at achieving the maximum probability of utility by considering the risk. Theoretically, our proposed responsible bandit learning is expected to achieve the fastest convergence rate among current bandit algorithms and generates more statistical power than classical normality-based test. Finally, simulation studies provide supporting evidence for the theoretical results and demonstrate stronger performance when using stricter privacy budgets.

Published

2024-03-24

How to Cite

Zhao, S., Cui, W., Jiang, B., Kong, L., & Yan, X. (2024). Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 21815-21822. https://doi.org/10.1609/aaai.v38i19.30182

Issue

Section

AAAI Technical Track on Safe, Robust and Responsible AI Track