Cheng, J., Lai, S., Yao, S., & Xue, B. (2026). Offline Multi-Objective Bandits: From Logged Data to Pareto-Optimal Policies. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36636–36644. https://doi.org/10.1609/aaai.v40i43.40987