Dai, X., Xie, Y., Liu, M., Wang, X., Li, Z., Wang, H., & Lui, J. C. (2026). A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37323–37331. https://doi.org/10.1609/aaai.v40i44.41064