Dai X, Xie Y, Liu M, Wang X, Li Z, Wang H, et al. A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses. AAAI [Internet]. 2026 Mar. 14 [cited 2026 Jul. 22];40(44):37323-31. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/41064