Multiple Trade-offs: An Improved Approach for Lexicographic Linear Bandits

Bo Xue; Xi Lin; Xiaoyuan Zhang; Qingfu Zhang

doi:10.1609/aaai.v39i20.35491

Authors

Bo Xue City University of Hong Kong, Hong Kong, China The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
Xi Lin City University of Hong Kong, Hong Kong, China The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
Xiaoyuan Zhang City University of Hong Kong, Hong Kong, China The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
Qingfu Zhang City University of Hong Kong, Hong Kong, China The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v39i20.35491

Abstract

This paper studies lexicographic online learning within the framework of multiobjective stochastic linear bandits (MOSLB), where the agent aims to simultaneously maximize multiple objectives in a hierarchical manner. Previous literature has investigated lexicographic online learning in multiobjective multi-armed bandits, a special case of MOSLB. They provided a suboptimal algorithm whose regret bound is approximately O(T^(2/3)) based on a priority-based regret metric. In this paper, we propose an algorithm for lexicographic online learning in the MOSLB model, achieving an almost optimal regret bound of approximately O(dT^(1/2)) when evaluated by the general regret metric. Here, d is the dimension of arm vectors, and T is the time horizon. Our method introduces a new arm filter and a multiple trade-offs approach to effectively balance exploration and exploitation across different objectives. Experiments confirm the merits of our algorithms and provide compelling evidence to support our analysis.

Multiple Trade-offs: An Improved Approach for Lexicographic Linear Bandits

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information