[1]

X. Dai, “A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses”, AAAI, vol. 40, no. 44, pp. 37323–37331, Mar. 2026.