Multiple Trade-offs: An Improved Approach for Lexicographic Linear Bandits
DOI:
https://doi.org/10.1609/aaai.v39i20.35491Abstract
This paper studies lexicographic online learning within the framework of multiobjective stochastic linear bandits (MOSLB), where the agent aims to simultaneously maximize multiple objectives in a hierarchical manner. Previous literature has investigated lexicographic online learning in multiobjective multi-armed bandits, a special case of MOSLB. They provided a suboptimal algorithm whose regret bound is approximately O(T^(2/3)) based on a priority-based regret metric. In this paper, we propose an algorithm for lexicographic online learning in the MOSLB model, achieving an almost optimal regret bound of approximately O(dT^(1/2)) when evaluated by the general regret metric. Here, d is the dimension of arm vectors, and T is the time horizon. Our method introduces a new arm filter and a multiple trade-offs approach to effectively balance exploration and exploitation across different objectives. Experiments confirm the merits of our algorithms and provide compelling evidence to support our analysis.Downloads
Published
2025-04-11
How to Cite
Xue, B., Lin, X., Zhang, X., & Zhang, Q. (2025). Multiple Trade-offs: An Improved Approach for Lexicographic Linear Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 39(20), 21850–21858. https://doi.org/10.1609/aaai.v39i20.35491
Issue
Section
AAAI Technical Track on Machine Learning VI